Today, Eli Dourado, Matt Sherman, and I launched PiracyData.org, a very simple site that tries to help answer the question, are the most-pirated movies each week available for legal streaming, digital rental, or digital purchase? We do this by mashing TorrentFreak’s weekly top-ten list of the most pirated movies on BitTorrent with Can I Stream It’s database of movie availability. The result if a single-page website that visualizes the results, as well as a downloadable dataset that will grow each week.
The idea for the site came to me last month when RIAA president Cary Sherman was testifying before Congress at a hearing on what further voluntary steps search engines could take to combat piracy. That same day, the MPAA had released a study that found that users who found themselves at URLs for infringing content had been “influenced” by search engines. This was reported in the press as “search engines lead to piracy.” The gist from the study and Sherman’s testimony was that search engines, and in particular Google, were not doing enough to address the fact that for some searches the top results include links to infringing content, and the implication, of course, is that if Google didn’t take voluntary action, perhaps Congress should require it to.
At the time I blogged an analysis of the MPAA study and noted that, according to the report, 58% of all visits to infringing URLs that were “influenced” by a search engine came from queries for either generic or title-based terms, not from the more-clearly suspicious “domain” terms. As the report remarked, this “indicat[es] that these consumers did not display an intention of viewing content illegally.” As I wrote at the time:
So the question is, why did these consumers who had no illegal intent end up at infringing sites? Could it be that they did not have a legal alternative to accessing the content they were seeking? That would not excuse their behavior, and it’s the movie industry’s prerogative whether and when to make their content available. Indeed release windows are part of its business model, although a business model seemingly in tension with consumer demand as evidenced by the shrinking theatrical release window. That all said, it’s not clear to me why search engines should be in the business of ensuring other industries’s business models remain unchanged.
After I wrote that it occurred to me that we could begin to collect data to answer that question, and so I asked Eli and Matt if they wanted to help me build the site. The initial answer the site is generating seems to be that very few are available legally.
To be clear, we only have three weeks of data so far, and we’ll get a better picture in the months ahead as the dataset grows. Additionally, proving the adage that given enough eyeballs all bugs are shallow, we’ve been alerted to the fact that a couple of the movies we were listing as unavailable this week are in fact available. Looking at the problem we found that although we were querying the correct IMDB ID for the movies, Can I Stream It was giving us back the wrong data. We’ve fixed the problem and updated the results. This is all to say that the site will prove its value a year from now when we have a substantial dataset.
That said, one implication of the early results may be that when movies are unavailable, illegal sources are the most relevant search results, so search engines like Google are just telling it like it is. That is their job, after all.
Also, while there is no way to draw causality between the fact that these movies are not available legally and that they are the most pirated, it does highlight that while the MPAA is asking Google to take voluntary action to change search results, it may well be within the movie studio’s power to change those results by taking voluntary action themselves. That is, they could make more movies available online and sooner, perhaps by collapsing the theatrical release window. Now, their business model is their prerogative, and it’s none of my business to tell them how to operate, but by the same token I I don’t see how they can expect search engines and Congress to bend over backwards to protect the business model they choose.
As we continue to debate what are the responsibilities of different actors in the Internet ecosystem related to piracy, we hope PiracyData.org will provide useful context.