Copyright

Today, we launched PiracyData.org, a site that takes the top ten most pirated movies of the week and mashes them up with data on legal online availability. Our hope is to build an extensive time-series dataset that can help shed light on the relationship between piracy and viewing options.

As might be expected with a new site, we’ve experienced some launch day glitches with the accuracy of our data and our visitors have thankfully pointed these out. We are of course committed to getting it right, so in the spirit of full transparency, we want to explain exactly what has gone wrong and how we plan on fixing it.

First, let me explain in detail how our site works and the exact data sources that we are using. Every hour, PiracyData.org polls the RSS feed for TorrentFreak’s most pirated movies posts. If the new week’s data is not yet in our database, we add it and fetch each movie’s availability from CanIStream.It.

CanIStream.It is a great site, but it is a little difficult for a computer to read. You can’t look up a movie by IMDB ID, which is pretty much the universal identifier for movies. What you can do, however, is pull up a CanIStream.It widget using IMDB ID.

The widget separates availability into four categories: streaming, rental, purchase, and physical DVDs. Given that this is a discussion of online piracy, we are really only interested in the first three categories, but we preserve all four. We scrape the page for movie availability on all of the services that the widget lists.

Making our site this way has presented us with four distinct issues that we only discovered once we started getting user feedback on the site:

1. Movie availability may change throughout the week

This is actually not a problem with our data, but with how it’s interpreted. Because the TorrentFreak data is backward-looking, reporting the most pirated movies in the previous week, we only want to report the online availability of movies as it appeared on Monday. That is, we are intentionally taking a snapshot of Monday availability. If movies become available for rental on Tuesday, we will continue to report throughout the remainder of the week that they were not available to rent on Monday, because that is most likely to reflect the state of the world during the preceding week when the piracy was happening.

A number of people have noted that Pacific Rim is now available for rental. We haven’t been able to confirm for sure, but we believe that it was added for rental at some point after we checked, and therefore this does not appear to be an error on our part. We’d appreciate it if anyone can confirm this because we want to make sure we are getting the right results.

2. Some services are available on CanIStream.It that are not listed in the widget, only on the main site

In particular, The Lone Ranger is available for rental only from a Sony service, but that service is absent in the CanIStream.It widget for not only The Lone Ranger but for all movies. Originally today, our site reported what the CanIStream.It widget reported, that the movie is not available for rental. However, when it was pointed out to us that CanIStream.It’s main site reports that The Lone Ranger is available on Sony, we updated our data to take account of that. We are going to find a way in the future to ensure that all services are automatically included in our dataset, but this means we may have to find another data source or resort to manual entry.

3. In at least one instance, CanIStream.It returned to us data for the wrong movie.

Here’s how the CanIStream.It widgets work: you go to the base url “http://www.canistream.it/external/imdb/” and add the IMDB ID for the movie you are querying. For example, since Pacific Rim’s ID is tt1663662, you can see the widget for the movie at http://www.canistream.it/external/imdb/tt1663662 .

This works perfectly most times, but bizarrely, it doesn’t work for This Is the End, whose IMDB is tt1245492. When you visit http://www.canistream.it/external/imdb/tt1245492 you get the CanIStream.It widget for Jay and Seth Vs. the Apocalypse, not This Is the End. As an outlier, this caught us totally by surprise, and we updated the data on our site to reflect the accurate data from This Is the End. Again, this is the kind of bug we could only have caught once we had lots of eyes on the site and we’re grateful for the feedback.

4. The site is built using the best available data.

TorrentFreak and CanIStream.It offer extremely useful data to the public. While we’ve had some issues incorporating the CanIStream.It data, we are grateful for the data they provide. CanIStream.It’s data is typically seen even among industry insiders as reliable. For instance, MPAA’s site wheretowatch.org directs their users to CanIStream.It as a source.

That said, if we want to build the canonical dataset on this issue, we have to do better. We need to make sure that there are no glitches. We would like to work with anyone with access to availability data to make sure that we can compile the most accurate data possible.

We’re not exactly sure what this entails yet. We may have to get availability data directly from the services themselves. If we can secure the cooperation of the services—for example if they would be willing to supply data on the date that each movie by IMDB number became available on their service—we could even compute availability data historically. TorrentFreak has data on pirated movies going back to 2006.

One thing is for certain: the dataset that we are proposing to build is important. We have provoked quite a reaction from people on both sides of this issue. We acknowledge that it has been a bumpy launch for our site, but we are committed to getting it right. We ask for everybody’s patience and good-faith assistance as we try to get there.

Today, Eli Dourado, Matt Sherman, and I launched PiracyData.org, a very simple site that tries to help answer the question, are the most-pirated movies each week available for legal streaming, digital rental, or digital purchase? We do this by mashing TorrentFreak’s weekly top-ten list of the most pirated movies on BitTorrent with Can I Stream It’s database of movie availability. The result if a single-page website that visualizes the results, as well as a downloadable dataset that will grow each week.

The idea for the site came to me last month when RIAA president Cary Sherman was testifying before Congress at a hearing on what further voluntary steps search engines could take to combat piracy. That same day, the MPAA had released a study that found that users who found themselves at URLs for infringing content had been “influenced” by search engines. This was reported in the press as “search engines lead to piracy.” The gist from the study and Sherman’s testimony was that search engines, and in particular Google, were not doing enough to address the fact that for some searches the top results include links to infringing content, and the implication, of course, is that if Google didn’t take voluntary action, perhaps Congress should require it to.

At the time I blogged an analysis of the MPAA study and noted that, according to the report, 58% of all visits to infringing URLs that were “influenced” by a search engine came from queries for either generic or title-based terms, not from the more-clearly suspicious “domain” terms. As the report remarked, this “indicat[es] that these consumers did not display an intention of viewing content illegally.” As I wrote at the time:

So the question is, why did these consumers who had no illegal intent end up at infringing sites? Could it be that they did not have a legal alternative to accessing the content they were seeking? That would not excuse their behavior, and it’s the movie industry’s prerogative whether and when to make their content available. Indeed release windows are part of its business model, although a business model seemingly in tension with consumer demand as evidenced by the shrinking theatrical release window. That all said, it’s not clear to me why search engines should be in the business of ensuring other industries’s business models remain unchanged.

After I wrote that it occurred to me that we could begin to collect data to answer that question, and so I asked Eli and Matt if they wanted to help me build the site. The initial answer the site is generating seems to be that very few are available legally.

To be clear, we only have three weeks of data so far, and we’ll get a better picture in the months ahead as the dataset grows. Additionally, proving the adage that given enough eyeballs all bugs are shallow, we’ve been alerted to the fact that a couple of the movies we were listing as unavailable this week are in fact available. Looking at the problem we found that although we were querying the correct IMDB ID for the movies, Can I Stream It was giving us back the wrong data. We’ve fixed the problem and updated the results. This is all to say that the site will prove its value a year from now when we have a substantial dataset.

That said, one implication of the early results may be that when movies are unavailable, illegal sources are the most relevant search results, so search engines like Google are just telling it like it is. That is their job, after all.

Also, while there is no way to draw causality between the fact that these movies are not available legally and that they are the most pirated, it does highlight that while the MPAA is asking Google to take voluntary action to change search results, it may well be within the movie studio’s power to change those results by taking voluntary action themselves. That is, they could make more movies available online and sooner, perhaps by collapsing the theatrical release window. Now, their business model is their prerogative, and it’s none of my business to tell them how to operate, but by the same token I I don’t see how they can expect search engines and Congress to bend over backwards to protect the business model they choose.

As we continue to debate what are the responsibilities of different actors in the Internet ecosystem related to piracy, we hope PiracyData.org will provide useful context.

Over the past year, as the debate over internet radio royalty rates has raged, I have been a lonely voice calling for the repeal for compulsory licensing of digital performance rights altogether. I did so at the Cato event for my book, Copyright Unbalanced, in January at a State of the Net panel, and in my Reason column. The reaction I often received was either one of outrage by the Pandoras of the world, or condescension for my naive optimism. Well, optimism can pay off. Yesterday Rep. Mel Watt, ranking member of the House Judiciary Committee’s Subcommittee on Courts, Intellectual Property and the Internet, introduced the “Free Market Royalty Act,” which among other things gets rid of compulsory licensing.

The problem with the compulsory licensing scheme is twofold: Not only does it rely on federal bureaucrats to set the rates that artists must accept for their music (rather than allowing a free-market negotiation take place between copyright holders and those who want to broadcast their songs), but it also allows Congress to pick winners and losers by assigning different royalty rate standards to different users. As I explained in Reason:

While AM, FM, cable and satellite radio, and Internet radio services like Pandora can all opt for compulsory licenses, they each pay different royalty rates. The rates are set by a panel of government lawyers called the Copyright Royalty Board, and they have the effect of favoring some business models over others. Internet radio services pay over 60 percent of their revenue in royalties, while Sirius XM, the only satellite radio company, pays only 8 percent. AM and FM radio aren’t subject to a digital sound recording right, so it pays zero.

Watt’s bill would blow all this up, making terrestrial broadcasters, Internet radio services, and the rest to give up their price-fixed compulsory licenses and have to negotiate in a market the rates they pay. This truly levels the playing field, especially vis-a-vis interactive music services like Spotify and Rdio that have never benefited from compulsory licenses.

Whether you talk to supporters of Rep. Chaffetz’s Internet Radio Fairness Act or Rep. Nadler’s Interim FIRST Act, they each will say their bill is the true fre market approach, and that their rate-setting standard would best approximate a market. To them I say, nothing better approximates a market than the market itself, so if they are truly concerned about ensuring a free market level playing field, here is the way to do it.

One advantage of compulsory licensing is that it can reduce transactions costs. The Watt bill retains some of this advantage by designating SoundExchange, a nonprofit agency, as the common agent for copyright owners to facilitate negotiations, but allowing labels and artists to retain the right to opt-out and negotiate on their own. If this bill passes, I think we’ll see some very interesting experimentation with business models on the part of both the artists and the radio stations.

Finally, looking at the) press coverage of this bill, what has gotten the most attention is that it would, for the first time, require terrestrial AM/FM radio stations to negotiate and pay royalties for the sound recordings they broadcast. The way I see it, it’s not clear to me why broadcasters deserve yet another subsidy, so I shed no tears for them if this bill passes. Broadcasters argue that they provide promotional value for the songs they broadcast, that this benefits copyright holder, and that they should therefore continue to pay nothing. If it is indeed the case that airplay provides substantial promotional value, that will be taken into account in the course of negotiations and we should expect the ultimate rate to reflect that. Indeed, you can even imagine an outcome where the free market rate for terrestrial stations would remain at zero, or even that copyright holders would want to pay the stations. That’s the beauty of the market, so let’s unleash it.

Yesterday the MPAA issued a report commissioned from the global PR firm Millward Brown looking at "the role of search in online piracy." This coincided with the RIAA's Cary Sherman testimony before the House IP subcommittee that search engines are not doing enough to protect his industry from piracy. Here are some thoughts on the new report and the issue generally.

The report tries to ascertain how much of the traffic to infringing content is sent there by search engines. To measure this, the report employs "a customized, hybrid approach" that doesn't merely look at whether the visit to an infringing URL came from a link on a search page. Instead, it looks at whether a user searched for a "qualifying" search term within 20 minutes of reaching the infringing URL. "Qualifying" search queries, the report says, are associated with attempts to find illegal content and include "domain terms like '1Channel' and 'sidereel', generic terms like 'watch movies online' and movie and TV title-based terms like 'Dark Knight Rises'." As the report puts it, "This holistic approach contrasts with a more narrow definition that counts search only when a visit is preceded by a visit to a search engine."

The report is clear that "this method did not seek to indicate the degree to which infringing content appears on search engine results pages themselves," but merely sought to show that search engines "influenced the path" users took to reach infringing content. It concluded that "approximately 20% of all visits to infringing content were influenced by a search query from 2010-2012."

I have a couple of concerns with this methodology. First is that it implicitly puts search engines on the hook not just for linking directly to infringing content (for which there is a notice-and-takedown process available), but also for "influencing the path" that a user takes on their web travels. As we all know, correlation is not causation, so it's not clear to me that because I searched for "transformers" 15 minutes before I visited the URL for a pirate stream of Game of Thrones that necessarily means that the search engine influenced me in any way, and much less should be responsible for my behavior.

Continue reading →

It’s been over five years since Congress passed major legislation addressing copyright protection, but this hasn’t stopped copyright owners from achieving real progress in securing their expressive works. In cooperation with private-sector stakeholders, rights holders have made several deals aimed at combating copyright infringement and channeling consumer demand for original content toward legitimate outlets. These voluntary agreements will be the subject of a hearing this afternoon (9/18) before the House Judiciary Committee’s Subcommittee on Courts, Intellectual Property and the Internet. This panel marks the latest in a series of hearings the committee launched earlier this year to review the Copyright Act, much of which dates back to 1976 or earlier.

Copyright consensus may sound like an oxymoron, especially in the wake of last year’s bruising legislative battle over SOPA and PIPA. But in reality, there’s no shortage of common ground when it comes to copyright protection. Despite all the controversy that surrounds the issue, copyright isn’t so much a “conflict of visions”, to borrow from Thomas Sowell, but a conflict of tactics, as I argued earlier this year on Cato Unbound.

Indeed, with some notable exceptions, most scholars, business leaders, and policymakers accept that government has a legitimate and important role in securing to inventors and creators the fruits of their labors“. Unsurprisingly, the devil is in the details, where genuinely tough questions arise regarding the government’s proper role in policing the Internet for copyright violations. Should the law hold online intermediaries accountable for their users’ infringing acts? What remedies should the law afford rights holders whose works are unlawfully distributed all over the Internet, often by profit-generating foreign actors?

Continue reading →

No doubt I won’t be the only one to point out how funny it is that today’s New York Times front page exposé on the excesses of the Renewable Fuel Standard puts the blame on Wall Street firms trading in the market for ethanol credits. But I also want to make a comparison to intellectual property.

As the headline puts it, “Wall St. Exploits Ethanol Credits, and Prices Spike,” Yet ethanol credits are a thing that affect the price of gasoline only because the government created them out of thin air and mandated their use. Having created a new commodity–and a mandatory one for many refiners–it’s no surprise speculators entered the market. Yet this is how the NYT describes it:

The banks say they have far less influence in the market than others are suggesting, and are doing nothing wrong. But the activities, while legal, could have consequences for consumers.

See that? It’s the perfectly legal activities of the banks that will have consequences for consumers. I’d say it’s the entire program itself, created out of thin air by the government, that allows for these activities in the first place.

Because Congress and the EPA didn’t accurately predict future gasoline consumption (shocker that) they set the amount of ethanol that refiners must blend into gasoline too high. Refiners are on the hook to use more ethanol than possible, which forces them to buy ethanol credits instead. So of course commodity speculators are going to play in this made-up market, but it’s not the players we should be hating, it’s the game.

And for the record, I don’t mean to excuse the banks. I don’t know enough about this issue, but it wouldn’t surprise me if the banks had a hand in getting the government to create this market. If they did, then that’s par for the crony capitalist course.

Continue reading →

Yesterday, Time Warner Cable and CBS reached a deal to end the weeks-long impasse that had resulted in CBS being blacked out in over 3 million U.S. households.

I predicted the two companies would resolve their differences before the start of the NFL season in a RealClearPolicy op-ed published last week:

From Los Angeles to New York, 3 million Americans in eight U.S. cities haven’t been able to watch CBS on cable for weeks, because of a business dispute between the network and Time Warner Cable (TWC). The two companies can’t agree on how much TWC should pay to carry CBS, so the network has blacked out TWC subscribers since August 1. With the NFL season kicking off on September 5, the timing couldn’t be worse for football fans.blackouts-work-1

Regulators at the Federal Communications Commission (FCC) face growing pressure to force the feuding companies to reach an agreement. But despite viewers’ frustrations with this standoff, government intervention isn’t the answer. If bureaucrats begin “overseeing” disputes between network owners and video providers, television viewers will face higher prices or lower-quality shows.

TWC and CBS are playing hardball over serious cash. CBS reportedly seeks to double its fee to $2 per subscriber each month, which TWC claims is an outrageous price increase. But CBS argues it costs more and more to develop hit new shows like Under the Dome, so it’s only fair viewers pay a bit more.

Both sides have a point. TWC is looking out for its millions of subscribers—and its bottom line—by keeping programming costs down. CBS, on the other hand, needs cash to develop creative new content, and hopes it can make some money doing so.

Continue reading →

Aereo LogoThere are few things more likely to get constituents to call their representative than TV programming blackouts, and the increase in broadcasting disruptions arising from licensing disputes in recent years means Congress may be forced to once again fix television and copyright laws. As Jerry Brito explains at Reason, the current standoff between CBS and Time Warner Cable is the result of bad regulations, which contribute to more frequent broadcaster blackouts. While each type of TV distributor (cable, satellite, broadcasters, telcos) is both disadvantaged and advantaged through regulation, broadcasters are particularly favored. As the US Copyright Office has said, the rule at issue in CBS-TWC is “part of a thicket of communications law requirements aimed at protecting and supporting the broadcast industry.”

But as we approach a damaging tipping point of rising programming costs and blackouts, Congress’ potential rescuer–Aereo–appears on the horizon, possibly buying more time before a major regulatory rewrite. Aereo, for the uninitiated, is a small online company that sets up tiny antennas in certain cities to capture broadcast television station signals–like CBS, NBC, ABC, Fox, the CW, and Univision–and streams those signals online to paying customers, who can watch live or record the local signals captured by their own “rented” Aereo antenna. Broadcasters hate this because the service deprives them of lucrative retransmission fees and unsuccessfully sued to get Aereo to cease operations. Continue reading →

Sherwin Siy, Vice President of Legal Affairs at Public Knowledge, discusses emerging issues in digital copyright policy. He addresses the Department of Commerce’s recent green paper on digital copyright, including the need to reform copyright laws in light of new technologies. This podcast also covers the DMCA, online streaming, piracy, cell phone unlocking, fair use recognition, digital ownership, and what we’ve learned about copyright policy from the SOPA debate.

Download

Related Links

Patrick Ruffini, political strategist, author, and President of Engage, a digital agency in Washington, DC, discusses his latest book with coauthors David Segal and David Moon: Hacking Politics: How Geeks, Progressives, the Tea Party, Gamers, Anarchists, and Suits Teamed Up to Defeat SOPA and Save the Internet. Ruffini covers the history behind SOPA, its implications for Internet freedom, the “Internet blackout” in January of 2012, and how the threat of SOPA united activists, technology companies, and the broader Internet community.

Download

Related Links