Software Patent of the Week: Hash Functions Are Not Novel

by on January 17, 2007 · 18 comments

Every week (more or less), I look at a software patent that’s been in the news. You can see previous installments in the series here. There haven’t been any big patent disputes in the news the last couple of weeks, so this week we’ll look at a patent that’s at the center of a lawsuit that was filed last August by Altnet against Streamcast. You can read about the long and tangled history of the two companies in the link above.

Here is one of the patents at issue in the case. It covers “Data processing system using substantially unique identifiers to identify data items, whereby identical data items have the same identifiers.” Here’s a description of how the patent differs from prior art:

In all of the prior data processing systems the names or identifiers provided to identify data items (the data items being files, directories, records in the database, objects in object-oriented programming, locations in memory or on a physical device, or the like) are always defined relative to a specific context. For instance, the file identified by a particular file name can only be determined when the directory containing the file (the context) is known. The file identified by a pathname can be determined only when the file system (context) is known. Similarly, the addresses in a process address space, the keys in a database table, or domain names on a global computer network such as the Internet are meaningful only because they are specified relative to a context.

In prior art systems for identifying data items there is no direct relationship between the data names and the data item. The same data name in two different contexts may refer to different data items, and two different data names in the same context may refer to the same data item.

In addition, because there is no correlation between a data name and the data it refers to, there is no a priori way to confirm that a given data item is in fact the one named by a data name. For instance, in a DP system, if one processor requests that another processor deliver a data item with a given data name, the requesting processor cannot, in general, verify that the data delivered is the correct data (given only the name). Therefore it may require further processing, typically on the part of the requestor, to verify that the data item it has obtained is, in fact, the item it requested.

This is laughably off base. What this patent is describing is hashing. A hash function takes any file and produces a “digital signature” in such a way that any two files are unlikely to have the same signature. (When two files have the same hash value, this is known as a collision. A good hashing algorithm minimizes collisions for a given set of likely data inputs)

MD5 is a popular hash function that was created in 1991. According to Wikipedia, it has recently begun to be superceded by SHA-1, which was created in 1995, and is generally considered to be more secure.

And these things aren’t confined to the laboratory. People publishing software online regularly include a cryptographically signed hash value for the code to ensure that a malicious middleman hasn’t replaced the software with his own, compromised version. This was already common practice when I started college in 1998. So the statement that in the prior art, “the requesting processor cannot, in general, verify that the data delivered is the correct data given only the name” is completely false.

It’s really quite remarkable that no one at the patent office has ever heard of a hash function. This is not an obscure concept in computer science. Apparently, the people reviewing software inventions at the patent office are not even casually acquainted with standard computer programing techniques.

Comments on this entry are closed.

Previous post:

Next post: