The Patent System is a Hashtable without a Hash Function

by on August 27, 2008 · 22 comments

[This post will be geekier than average. Apologies in advance to non-programmers]

One of the interesting aspects of Intellectual Property and Open Source is the frequent use of programming metaphors to explain legal concepts. Given the audience, it’s a clever approach. Most of the analogies work well. A few fall flat.

I found one analogy particularly illuminating, albeit not in quite the way Lindberg intended. He analogizes the patent system to memoization, the programming technique in which a program stores the results of past computations in a table to avoid having to re-compute them. If computing a value is expensive, but recalling it from a table is cheap, memoization can dramatically speed up computation. Lindberg then compares this to the patent system:

The patent system as a whole can be compared to applying memoization to the process of invention. Creating a new invention is like calling an expensive function. Just as it is inefficient to recompute the Fibonacci numbers for each function invocation, it is inefficient to force everyone facing a technical problem to independently invent the solution to that problem. The patent system acts like a problem cache, storing the solutions to specific problems for later recall. The next time someone has the same problem, the saved solution (as captured by the patent document) can be used.

Just as with memoization, there is a cost associated with the patent process, specifically, the 20-year term of exclusive rights associated with the patent. Nevertheless, the essence of the utilitarian bargain is that granting temporary exclusive rights to inventions is ultimately less expensive than forcing people to independently recreate the same invention.

The caveat at the beginning of the second paragraph is huge. In the software industry, at least, any patent filed in the 1980s is virtually worthless today. But even setting that point aside, Lindberg’s analogy provides a helpful analogy to explain why patents are a bad fit for the software industry: it’s like implementing memoization using a lookup table without a hash function.

The reason memoization works is that we have a data structure called a hash table that allows programs to look up a name:value pairs in a constant amount of time, no matter how many name:value pairs we have. (In big-O notation, lookups are O(1)). However, if someone who didn’t know about hashing functions tried to implement a lookup table, his program might be stuck examining key:value pairs one at a time, leading to a lookup time proportional to the number of key:value pairs (in big-O notation O(n)). And using such a hashtable in large computations would cause terrible performance problems. For example, if we were using memoization to compute the Fibonacci sequence with a properly-executed hashtable, the execution time would be proportional to the size of the input (O(n)). But with our poorly-executed lookup table, the execution time would be proportional to the square of the input (O(n2)). The net result is that if the former algorithm can compute the millionth Fibonacci number in a second, the latter would take on the order of a week.

OK, so what does this have to do with the patent system? For most industries, the patent system works like a lookup table with performance O(n), where n is the number of patents in your industry. There’s no hashing function—no mechanism to quickly hone in on the specific patent relevant to your invention—so your only option is to hire a patent lawyer to review every potentially-relevant patent, one at a time. And of course, doing these lookups is mandatory. That means that the patent system as a whole has a cost O(n2) with respect to the number of firms (holding the patenting rate within each firm constant). Doubling the number of firms both doubles the number of patents and doubles the number of eyeballs that must examine each patent.

In a competitive industry like the software industry, the value of this particular O(n2) cost function is probably larger than the entire revenues of the software industry. That is, if we really required every software company to read every patent issued to every other software company, the legal bills would bankrupt the entire industry. Since it’s essentially impossible to comply with the law, most software companies simply don’t try. And so we get endless stories about companies inadvertently infringing other companies’ patents. This isn’t evidence of malice or incompetence on the part of the latter company. It just reflects a brute force of mathematics.

It’s worth noting that some industries do have a usable hashing function. The pharmaceutical industry has chemical formulas, which may allow O(1) patent lookups. You type the formula of your drug into a database and it pops out a small number of patents that relate to that drug. This may be why the pharmaceutical industry is so much more enthusiastic about patents than most other industries. The costs of the patent system is O(n) with respect to the number of firms (and the number of pharma firms is small) whereas for most other industries it’s O(n2).

  • http://gondwanaland.com/mlog mlinksva

    Regardless of lookup costs, there's a more fundamental problem with

    “The patent system acts like a problem cache, storing the solutions to specific problems for later recall.”

    First there is the implication that solutions would not be stored (ie every time a problem is faced a solution would have to be found anew by the entity facing the problem) in absence of a patent system, which seems absurd.

    Second, there is the implication that the content of patents (which are after all what is stored and looked up by the system) contain solutions in a form actually useful to people who have problems, which also seems absurd, though admittedly I know nothing about pharma development practice.

    So, in any field, does the patent system actually serve as a solution lookup mechanism for practitioners (and not merely for the purpose of finding out whether a potential solution is patented!)?

  • http://www.blaynesucks.com Aaron Massey

    But even setting that point aside, Lindberg’s analogy provides a helpful analogy to explain why patents are a bad fit for the software industry: it’s like implementing memoization using a lookup table without a hash function.

    To me the more apt memoization analogy to the patent system focuses on the overlapping subproblems that you hinted at with the Fibonacci sequence example. Essentially, the patent system attempts to use a dynamic programing technique to speed-up innovation. The core problem is that the 20 intervening years during which the patent is valid makes whatever subproblem the patent solved worthless. As you mentioned, there are essentially no valuable software patents from the 1980's. Thus, no one would have a use for these patents irrespective of their ability to perform a lookup. I think this gets more to the core of why software patents really hurt innovation in software rather than the lookup concern.

    Although, the lookup concerns are secondary in my mind, they are certainly important. For example, consider the situation that would happen if software patents were only valid for three years. Some of the subproblems those patents solved would still be relevant and important. This means that being able to find a fast solution to those subproblems would be useful. However, without being able to perform an easy lookup the software industry wouldn't be able to make use of any valuable solutions to subproblems with which they may be dealing. Thus, even if the core problem with the patent system didn't exist, the lookup concerns you mentioned would still render current patent system as useless.

    This also fits with your discussion of the pharmaceutical industry. They actually benefit from the patent system not simply because they can perform a reasonable lookup, but also because they can potentially still make use of the solutions to subproblems they find to solve larger problems.

  • http://bennett.com/blog Richard Bennett

    Apparently you dudes are unaware of the fact that modern high-tech tools exist that make it easy to search the on-line patent database. It's kinda like this “Google” thing that searches web sites, only easier because people write patents in technical language.

    But there's no such thing as a “software patent” anyhow, just a lot of computer and electronics patents. But keep searching for that silver bulliet, you may find it one day.

  • http://www.tc.umn.edu/~leex1008 Tim Lee

    Sometimes I wonder if you're trolling us deliberately. How exactly would a small firm's patent lawyer come up with the list of search terms that will generate a list of all potentially-infringing patents? Remember that even a small software company will have thousands of lines of code, each of which could potentially be infringing.

  • http://www.blaynesucks.com Aaron Massey

    Seriously? This is supposed to be easier than Google? Honestly, I think Google's engine is better and more accurate. Of course, if you're talking about the tools that expensive lawyers use, then you're pretty much already in a different garage than the one in which most software engineers tinker.

    Though, I do wish you were right about the existence of software patents. (Business method patents should probably also disappear for that matter.) I kinda like Ben Klemens' proposals, but we're certainly not there yet.

  • http://bennett.com/blog Richard Bennett

    Tim made the claim that there's no search function for patents, but clearly there are several: Google patents, the USPTO has a free one, and firms like MicroPatent have some that you can pay for. Any law firm that files patents will have a MicroPatent account, or something equivalent, if they're competent. So Tim's claim is false.

    Now we can have an interesting discussion about improving the searchability of patents if you want, but let's get over the idea that the Patent DB isn't searchable at all.

  • http://www.tc.umn.edu/~leex1008 Tim Lee

    Um, the word “search” doesn't appear in my post.

  • http://bennett.com/blog Richard Bennett

    Right, you used the term “hash function” when you were talking about searching a hash table search function. You used the term incorrectly, but I didn't call you on it and gave you the benefit of the doubt. And BTW, hash tables don't guarantee lookup time, because you can have hash collisions. There are other data structures, such as Patricia Tries, that do provide bounded lookup times; use use them for things like IP and MAC addresses.

    Don't be so picky.

  • http://www.tc.umn.edu/~leex1008 Tim Lee

    The point is that I never claimed that there are no patent search engines. Rather, I was claiming that for any given technology, there's no straightforward way to determine which patents that technology might infringe other than reading all the patents in that technology class. A search engine doesn't help because there's no way to generate a list of search terms that will turn up all relevant patents.

    Also, the function might not be strictly O(1), but with a well-designed hashtable the average lookup time is basically independent of the number of elements. In any event, this doesn't really change my point. Substitute another data structure with constant-time lookups if you like.

  • http://bennett.com/blog Richard Bennett

    If what you said were true, there would be no new patents, Tim.

    What actually happens is that people search the patent data base using a variety of terms, find close matches, and ensure that their patent doesn't infringe by carefully distinguishing it. If you read any patents, you'll no doubt notice a list of similar patents and the explanation of why the patent in question is different. It's not a fool-proof system, but people seem to handle it.

    So rather than there being no “hash function” there are several.

  • http://www.tc.umn.edu/~leex1008 Tim Lee

    It's not a fool-proof system.

    That's quite an understatement.

  • http://bennett.com/blog Richard Bennett

    Yes, we all know the patent system is all screwed up. What we need are constructive suggestions for fixing it.

  • http://bennett.com/blog Richard Bennett

    If what you said were true, there would be no new patents, Tim.

    What actually happens is that people search the patent data base using a variety of terms, find close matches, and ensure that their patent doesn't infringe by carefully distinguishing it. If you read any patents, you'll no doubt notice a list of similar patents and the explanation of why the patent in question is different. It's not a fool-proof system, but people seem to handle it.

    So rather than there being no “hash function” there are several.

  • http://www.tc.umn.edu/~leex1008 Tim Lee

    It's not a fool-proof system.

    That's quite an understatement.

  • http://bennett.com/blog Richard Bennett

    Yes, we all know the patent system is all screwed up. What we need are constructive suggestions for fixing it.

  • Pingback: Recent Links Tagged With "function" - JabberTags

  • Pingback: Trademarks as Symbolic Links | The Technology Liberation Front

  • Pingback: amnesia a machine for pigs free download

  • Pingback: devenir rentier

  • Pingback: meubels

  • Pingback: premier league indonesia

  • Pingback: Twitter Home

Previous post:

Next post: