Via TechDirt,, the Wall Street Journal is reporting that Harper Collins is going to scan its books and provide the digital scans for use by search engines such as Google.
I don’t get it.
Presumably this was contemplated as a response to Google Book Search, but I don’t really see how it clarifies or addresses any of the concerns raised by that case. The question at issue in that case is whether Google has the right to scan and index publishers’ books without their permission. Obviously, if Harper Collins is giving Google the digital copies itself, then it’s implicitly giving Google permission, which kind of makes the lawsuit a moot point, doesn’t it?
Perhaps Harper Collins’s executives think that scanning the books themselves and letting Google “borrow” the digital copies would somehow make the process more secure. But that doesn’t make a lot of sense. This program simply changes the source of Google’s data. It doesn’t do anything, as far as I can tell, to change how and where the data is stored within Google’s search engine. Google is still going to need to keep local copies of the books (or at least indexes of them, from which the books can easily be reconstructed) on its servers for performance reasons, so it’s not likely to reduce the total number of copies in circulation.
My guess is that HC’s leadership simply isn’t thinking clearly about the way digital content works. If the “digital copies” in question were physical books, it would make sense for HC to keep the physical copies in its warehouse and “lend” them to Google and others to use and then “give back.” That would increase security because “the originals” would always stay in HC’s possession. But digital data doesn’t work like that. Digital data isn’t “moved,” it’s copied. So when Google “borrows” HC’s digital books, it is, in fact, making a copy of them. That copy is just as good as the original, and every bit as much a security risk.
The other possibility is that they’re hoping the courts will be similarly confused by the analogy. Even though it doesn’t make any sense in a digital context, it’s possible that a judge will be persuaded that allowing the copyright holder to hold “the original” copy of a digital book is more secure or otherwise more legitimate than allowing Google to create “its own” original. HC’s move might be savvy legal strategy even if it doesn’t make any sense from a technical perspective.
Update: Jerry suggests that licensing this database to smaller search engines that lack the resources to scan the books on their own could be a nice revenue stream, which is an excellent point. However, I don’t see how that helps “protect authors’ rights,” which is what most news stories claim the point is. This story, for example, quotes a HC executive complaining that there are “too many digital copies” of the books around, which I think fundamentally misunderstands how search engines work. A good search engine needs to be able to do full-text searches of the book, and to do that, you almost certainly need to have a copy of the full text on your server. If the goal of this project is to protect authors’ rights, I still say they’re barking up the wrong tree.