Jane Yakowitz of Brooklyn Law School recently posted an interesting 63-page paper on SSRN entitled, “Tragedy of the Data Commons.” For those following the current privacy debates, it is must reading since it points out a simple truism: increased data privacy regulation could result in the diminution of many beneficial information flows.
Cutting against the grain of modern privacy scholarship, Yakowitz argues that “The stakes for data privacy have reached a new high water mark, but the consequences are not what they seem. We are at great risk not of privacy threats, but of information obstruction.” (p. 58) Her concern is that “if taken to the extreme, data privacy can also make discourse anemic and shallow by removing from it relevant and readily attainable facts.” (p. 63) In particular, she worries that “The bulk of privacy scholarship has had the deleterious effect of exacerbating public distrust in research data.”
Yakowitz is right to be concerned. Access to data and broad data sets that include anonymized profiles of individuals is profound importantly for countless sectors and professions: journalism, medicine, economics, law, criminology, political science, environmental sciences, and many, many others. Yakowitz does a brilliant job documenting the many “fruits of the data commons” by showing how “the benefits flowing from the data commons are indirect but bountiful.” (p. 5) This isn’t about those sectors making money. It’s more about how researchers in those fields use information to improve the world around us. In essence, more data = more knowledge. If we want to study and better understand the world around us, researchers need access to broad (and continuously refreshed) data sets. Overly restrictive privacy regulations or forms of liability could slow that flow, diminish or weaken research capabilities and output, and leave society less well off because of the resulting ignorance we face.
Consequently, her paper includes a powerful critique of the “de-anonymization” and “easy re-identification” fears set forth by the likes of Paul Ohm, Arvind Narayanan, Vitaly Shmatikov, and other computer scientists and privacy theorists. These scholars have suggested that because the slim possibility exists of some individuals in certain data sets being re-identified even after their data is anonymized, that fear should trump all other considerations and public policy should be adjusted accordingly (specifically, in the direction of stricter privacy regulation / tighter information controls).
She continues on to brilliantly dissect and counter this argument that the privacy community has put forward, which is tantamount to a ‘it’s better to be safe than sorry’ mentality. “If public policy had embraced this expansive definition of privacy — that privacy is breached if somebody in the database could be reidentified by anybody else using special non-public information — dissemination of data would never have been possible,” she observes.
If anything, Yakowitz doesn’t go far enough here. In my recent filing to the Federal Trade Commission in their “Do Not Track” proceeding, I noted that we are witnessing the rise of a “Privacy Precautionary Principle” that threatens to block many forms of digital progress through the application of a “Mother, May I” form of prophylactic information regulation. Basically, you won’t be allowed to collect many forms of information / data until receiving permission from some agency, or at least an assurance that crushing liability won’t be imposed later if you do. Such a Privacy Precautionary Principle will have troubling implications for the future of the Internet and free speech (especially press freedoms) as it essentially threatens to stop digital progress in its tracks based on conjectural fears about data collection / aggregation. But it’s effect will be even more deleterious on the research community. Yakowitz brilliantly addresses this danger in her paper when she notes:
Privacy advocates tend to play the role of doom prophets — their predictions of troubles are ahead of their times. Convinced of the inevitability of the harms, privacy scholars are dissatisfied with reactive rather than proactive, regulation. Reactive legislation gets a bad wrap, but it is the most appropriate course for anonymized research data. Legislation inhibiting the dissemination of research data would have guaranteed drawbacks today for the research community and to society at large. We should find out whether reidentification risk materializes before taking such drastic measures. (p. 48-9)
Quite right. The application of a Privacy Precautionary Principle would have radical ramifications for the research community and, then, society more generally. It is essential, therefore, that policymaker think carefully about calls for sweeping privacy controls and, at a minimum, conduct a comprehensive cost-benefit analysis of any proposed regulations before shutting off the flow of information online.
But we should realize that there will never be black-and-white answers to some of these questions. Using the example of Google Flu Trends, Yakowitz notes that it is often impossible to predict in advance what data or data sets are “socially valuable” in an effort to determine preemptively what should be allowed vs. restricted:
Google Flu Trends exemplifies why it is not possible to come to an objective, prospective agreement on when data collection is sufficiently in the public’s interest and when it is not. Flu Trends is an innovative use of data that was not originally intended to serve an epidemiological purpose. It makes use of data that, in other contexts, privacy advocates believe violate Fair Information Practices. This illustrates a concept understood by data users that is frequently discounted by the legal academy and policymakers: some of the most useful, illuminating data was originally collected for a completely unrelated purpose. Policymakers will not be able to determine in advance which data resources will support the best research and make the greatest contributions to society. To assess the value of research data, we cannot cherry-pick between “good” and “bad” data collection. (p. 11-12)
Importantly, Yakowitz also notes that data access is a powerful accountability mechanism. “A thriving public data commons serves the primary purpose of facilitating research, but it also serves a secondary purpose of setting a data-sharing norm so that politically-motivated access restrictions will stick out and appear suspect.” (p. 17) In essence: the more data, the more chances we can hold those around us more accountable and check their power. (David Brin also made this point brilliantly in his provocative 1998 book, The Transparent Society, in which he noted that it was access to information and openness to data flows that put the “light” in Enlightenment!)
Yakowitz feels so passionately about openness and access to data that she goes on to propose a safe harbor to shield data producers / aggregators from liability if they follow a set of reasonable anonymization protocols. She compares this to the sort of protection that the First Amendment offers to journalists as they seek to unearth and disseminate important truthful information of public interest. She argues that aggregated research data serves a similar purpose because of the myriad ways it benefits society and, therefore, those who produce and aggregate it deserve some protection from punishing liability. (I thought she might also reference CDA Sec. 230 here as well, but she didn’t. Sec. 230 immunizes online intermediaries from potentially punishing forms of liability for the content sent over their digital networks. The purpose is to ensure that information continues to flow across digital networks by avoiding the “chilling effect” that looming liability would have on intermediaries and online discourse. See my essays: 1, 2, 3, 4.)
Anyway, read the entire Yakowitz paper. It deserves serious attention. It could really shake up the current debate over privacy regulation.