The Patent System is a Hashtable without a Hash Function

by on August 27, 2008 · 22 comments

[This post will be geekier than average. Apologies in advance to non-programmers]

One of the interesting aspects of Intellectual Property and Open Source is the frequent use of programming metaphors to explain legal concepts. Given the audience, it’s a clever approach. Most of the analogies work well. A few fall flat.

I found one analogy particularly illuminating, albeit not in quite the way Lindberg intended. He analogizes the patent system to memoization, the programming technique in which a program stores the results of past computations in a table to avoid having to re-compute them. If computing a value is expensive, but recalling it from a table is cheap, memoization can dramatically speed up computation. Lindberg then compares this to the patent system:

The patent system as a whole can be compared to applying memoization to the process of invention. Creating a new invention is like calling an expensive function. Just as it is inefficient to recompute the Fibonacci numbers for each function invocation, it is inefficient to force everyone facing a technical problem to independently invent the solution to that problem. The patent system acts like a problem cache, storing the solutions to specific problems for later recall. The next time someone has the same problem, the saved solution (as captured by the patent document) can be used.

Just as with memoization, there is a cost associated with the patent process, specifically, the 20-year term of exclusive rights associated with the patent. Nevertheless, the essence of the utilitarian bargain is that granting temporary exclusive rights to inventions is ultimately less expensive than forcing people to independently recreate the same invention.

The caveat at the beginning of the second paragraph is huge. In the software industry, at least, any patent filed in the 1980s is virtually worthless today. But even setting that point aside, Lindberg’s analogy provides a helpful analogy to explain why patents are a bad fit for the software industry: it’s like implementing memoization using a lookup table without a hash function.

The reason memoization works is that we have a data structure called a hash table that allows programs to look up a name:value pairs in a constant amount of time, no matter how many name:value pairs we have. (In big-O notation, lookups are O(1)). However, if someone who didn’t know about hashing functions tried to implement a lookup table, his program might be stuck examining key:value pairs one at a time, leading to a lookup time proportional to the number of key:value pairs (in big-O notation O(n)). And using such a hashtable in large computations would cause terrible performance problems. For example, if we were using memoization to compute the Fibonacci sequence with a properly-executed hashtable, the execution time would be proportional to the size of the input (O(n)). But with our poorly-executed lookup table, the execution time would be proportional to the square of the input (O(n2)). The net result is that if the former algorithm can compute the millionth Fibonacci number in a second, the latter would take on the order of a week.

OK, so what does this have to do with the patent system? For most industries, the patent system works like a lookup table with performance O(n), where n is the number of patents in your industry. There’s no hashing function—no mechanism to quickly hone in on the specific patent relevant to your invention—so your only option is to hire a patent lawyer to review every potentially-relevant patent, one at a time. And of course, doing these lookups is mandatory. That means that the patent system as a whole has a cost O(n2) with respect to the number of firms (holding the patenting rate within each firm constant). Doubling the number of firms both doubles the number of patents and doubles the number of eyeballs that must examine each patent.

In a competitive industry like the software industry, the value of this particular O(n2) cost function is probably larger than the entire revenues of the software industry. That is, if we really required every software company to read every patent issued to every other software company, the legal bills would bankrupt the entire industry. Since it’s essentially impossible to comply with the law, most software companies simply don’t try. And so we get endless stories about companies inadvertently infringing other companies’ patents. This isn’t evidence of malice or incompetence on the part of the latter company. It just reflects a brute force of mathematics.

It’s worth noting that some industries do have a usable hashing function. The pharmaceutical industry has chemical formulas, which may allow O(1) patent lookups. You type the formula of your drug into a database and it pops out a small number of patents that relate to that drug. This may be why the pharmaceutical industry is so much more enthusiastic about patents than most other industries. The costs of the patent system is O(n) with respect to the number of firms (and the number of pharma firms is small) whereas for most other industries it’s O(n2).

Previous post:

Next post: