The privacy debate has been increasingly shaped by an apparent consensus that de-identifying sets of personally identifying information doesn’t work. In particular, this has led the FTC to abandon the PII/non-PII distinction on the assumption that re-identification is too easy. But a new paper shatters this supposed consensus by rebutting the methodology of Latanya Sweeney’s seminal 1997 study of re-identification risks, which in turn, shaped the HIPAA’s rules for de-identification of health data and the larger privacy debate ever since.
This new critical paper, “The ‘Re-Identification’ of Governor William Weld’s Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now” was published by Daniel Barth-Jones, an epidemiologist and statistician at Columbia University. After carefully re-examining the methodology of Sweeney’s 1997 study, he concludes that re-identification attempts will face “far-reaching systemic challenges” that are inherent in the statistical methods used to re-identify. In short, re-identification turns out to be harder than it seemed—so our identity can more easily be obscured in large data sets. This more nuanced story must be understood by privacy law scholars and public policy-makers if they want to realistically assess current privacy risks posed by de-identified data—not just for health data, but for all data.
The importance of Barth-Jones’s paper is underscored by the example of Vioxx, which stayed on the market years longer than it should have because of HIPAA’s privacy rules, thus resulting in 88,000 and 139,000 unnecessary heart attacks, and 27,000-55,000 avoidable deaths—as University of Arizona Law Professor Jane Yakowitz Bambauer explained in a recent Huffington Post piece.
Ultimately, overstating the risk of re-identification causes policymakers to strike the wrong balance in the trade-off of privacy with other competing values. As Barth-Jones and Yakowitz have suggested, policymakers should instead focus on setting standards for proper de-identification of data that are grounded in a rigorous statistical analysis of re-identification risks. A safe harbor for proper de-identification, combined with legal limitations on re-identification, could protect consumers against real privacy harms while still allowing the free flow of data that drives research and innovation throughout the economy.
Unfortunately, the Barth-Jones paper has not received the attention it deserves. So I encourage you consider writing about this, or just take a moment to share this with your friends on Twitter or Facebook.