Software Patent of the Week: Hash Functions Are Not Novel

by Tim Lee on January 17, 2007 · View Comments

Every week (more or less), I look at a software patent that’s been in the news. You can see previous installments in the series here. There haven’t been any big patent disputes in the news the last couple of weeks, so this week we’ll look at a patent that’s at the center of a lawsuit that was filed last August by Altnet against Streamcast. You can read about the long and tangled history of the two companies in the link above.

Here is one of the patents at issue in the case. It covers “Data processing system using substantially unique identifiers to identify data items, whereby identical data items have the same identifiers.” Here’s a description of how the patent differs from prior art:

In all of the prior data processing systems the names or identifiers provided to identify data items (the data items being files, directories, records in the database, objects in object-oriented programming, locations in memory or on a physical device, or the like) are always defined relative to a specific context. For instance, the file identified by a particular file name can only be determined when the directory containing the file (the context) is known. The file identified by a pathname can be determined only when the file system (context) is known. Similarly, the addresses in a process address space, the keys in a database table, or domain names on a global computer network such as the Internet are meaningful only because they are specified relative to a context.

In prior art systems for identifying data items there is no direct relationship between the data names and the data item. The same data name in two different contexts may refer to different data items, and two different data names in the same context may refer to the same data item.

In addition, because there is no correlation between a data name and the data it refers to, there is no a priori way to confirm that a given data item is in fact the one named by a data name. For instance, in a DP system, if one processor requests that another processor deliver a data item with a given data name, the requesting processor cannot, in general, verify that the data delivered is the correct data (given only the name). Therefore it may require further processing, typically on the part of the requestor, to verify that the data item it has obtained is, in fact, the item it requested.

This is laughably off base. What this patent is describing is hashing. A hash function takes any file and produces a “digital signature” in such a way that any two files are unlikely to have the same signature. (When two files have the same hash value, this is known as a collision. A good hashing algorithm minimizes collisions for a given set of likely data inputs)

MD5 is a popular hash function that was created in 1991. According to Wikipedia, it has recently begun to be superceded by SHA-1, which was created in 1995, and is generally considered to be more secure.

And these things aren’t confined to the laboratory. People publishing software online regularly include a cryptographically signed hash value for the code to ensure that a malicious middleman hasn’t replaced the software with his own, compromised version. This was already common practice when I started college in 1998. So the statement that in the prior art, “the requesting processor cannot, in general, verify that the data delivered is the correct data given only the name” is completely false.

It’s really quite remarkable that no one at the patent office has ever heard of a hash function. This is not an obscure concept in computer science. Apparently, the people reviewing software inventions at the patent office are not even casually acquainted with standard computer programing techniques.

View Comments Posted in: Patents

  • Noel, if the USPTO would actually grant this patent, then its standards are what are subjective and broken. In this case, it's very, very obvious. There are only two ways to uniquely identify something. Qualify it, or give it a unique id.
  • Noel, I think you prove my point. I freely admit that I'm not an expert on the ins and outs of the USPTO's (or the Federal Circuit's) obviousness and novelty rules. However, given that my whole point is that those rules are screwed up, I don't see how that's relevant. Any patent system that grants this patent is broken. Knowing the details about why it's screwed up might help us craft a solution, but at the moment my goal is simply to document the extent of the problem. If and when I do a more formal paper on what's wrong with software patents and how to fix them, I'll be sure to learn the gory details about how the current rules work.
  • Well, Snorre, I have some friends who are or used to be at the USPTO, and am not going to criticize their work. However, the USPTO's standards for novelty and obviousness are, as you flamboyantly suggest, a bit outdated.
  • Snorre
    I'd guess that the USPTO bureaucrats think of novelty and obviousness like this: "WOW IT'S THE INTARWEBS THEY ARE FAST ZOOOM POW WOW A ROCKET IN SPACE HOW DO THEY THINK THIS UP!"

    Maybe some of them have some kids who use bittorrent or pretty much any other p2p file sharing method, and could enlighten them about hashing? Or, you know, computers in general?
  • Well, I'm not caught up on USPTO re-examination procedures, but then I'm not doing a patent of the week series where I need to invent my own standards for novelty and obviousness:):)
  • Noel, enlighten me. How would the USPTO evaluate this patent, in light of the fact that hashing is a well-known and widely documented computer science concept?
  • Tim, there is actually a standard for novelty in patent doctrine. Even though some technologies may not be novel to you, it may be more productive to talk about novelty in the context of how a court or the USPTO would consider it. The same goes with obviousness. Even though you may call some things obvious, you're using your subjective standard rather than anything dealing with our patent system.
  • As is typical for software patents, it's painfully difficult to tell by reading it what is being patented, even if you're an experienced software developer. I wonder if that's part of the reason why so many absurd software patents are issued.
  • And then there's CRC, which came before MD5 and is used for validity checksums in the popular Zip archive format. (and due to it's small output hash length, that's about all it's good for)
blog comments powered by Disqus

Previous post:

Next post: