Search, cache, and copyright

by on February 13, 2007 · 6 comments

Google has lost its copyright appeal against Belgian newspaper publishers. There seem to be conflicting reports about what exactly Google was found liable for. Here’s the WSJ:

A Belgian court ruled Tuesday that Internet search engine Google Inc. violated Belgian copyright law when it published snippets and links to Belgian newspapers on its Web site without permission.

And here’s the AP:

A Brussels court ruled in favor of Copiepresse, a copyright protection group representing 18 mostly French-language newspapers that complained the search engine’s “cached” links offered free access to archived articles that the papers usually sell on a subscription basis.

Snippets and entire cached pages are very different things. But whatever the case, what this case highlights is how unsettled copyright law is as it applies to search engines (and I’ll limit myself to just the U.S.). As for snippets, sure, there’s Kelly v. Arriba Soft, which found that indexing photographs and displaying their thumbnails is a fair use. But that’s just one circuit’s opinion, which is very persuasive, but not controlling in other circuits. Then there’s Perfect 10 v. Google, which cuts in the opposite direction.


In the case of cached pages (but also snippets), the heart of the problem is a collision between the norms of the internet and the laws of meatspace, which were developed in a different context. On the web, the norm is opt-out. If you don’t want your site cached or even indexed by a search engine, you can set your robots.txt file to state that. In the “real world,” you have to acquire permission before you can copy and distribute a copyrighted work. The law right now, however, only incorporates the latter. As I read the Copyright Act, respect for robots.txt doesn’t get you off the hook for infringement. Perhaps it should.

Maybe courts will carve out a search engine fair use exception the same way they came up the “time shifting” exception in Sony v. Universal. A) I wouldn’t hold my breath, and B) I’d rather see a clear restatement of the law that gives legal protection to internet norms. I’d like to see this not just because I’m biased toward the wide availability of information, but also because it’s economically efficient. As Coase taught us, absent transaction costs, you will get the same result whether you use an opt-in or opt-out system. Given that there are in fact transaction costs, we should pick the system that minimizes them. I don’t think it’s controversial to say that the vast majority of web publishers want to be indexed and cached by Google and other search engines. It’s therefore more efficient to impose the small cost of creating a robots.txt file on the minority of publishers who don’t want to be indexed or cached, rather than place the burden on Google to ask for permission from every one of the millions of websites out there, or to require all those websites to opt-in to every search engine.

Comments on this entry are closed.

Previous post:

Next post: