“De-identified”? Sometimes You Can Disagree With Yourself

by on May 28, 2009 · 16 comments

Recall a couple of years ago when I lauded Google – and also picked on them – for making customer data “more anonymous”?

“‘Anonymous’ is correctly regarded as an absolute condition,” I wrote. “Like pregnancy, anonymity is either there or it’s not. Modifying the word with a relative adjective like ‘more’ is a curious use of language.”

The challenge of these concepts – “anonymized” or “de-identified” data – is still around, and it’s still a difficult one.

Here’s a sophisticated take on the question:

Information is increasingly difficult to classify as “identified” or “de-identified,” particularly as it is copied, exchanged, or recombined with other information. With rapidly evolving technologies and databases, it is more appropriate to describe a spectrum of “identifiability,” rather than a binary classification of information as identifiable or not. The question could then become not whether deidentified information might be made re-identifiable, but rather which entities would be able to re-identify the information, how much effort they would have to expend, and what limits are placed on their doing so.

And here’s an advocacy group apparently lacking that sophistication. They treat information as flatly “de-identified” in a legal filing about a New Hampshire law that bans the sale of prescription drug data for marketing purposes:

[T]he Prescription Information Law does not implicate patient privacy. While it purports to protect privacy interests, the statute regulates patient de-identified information.

Here’s the thing: Both quotes were issued by the Center for Democracy and Technology.

The first is from CDT’s filing with the Department of Health and Human Services about the circumstances under which HHS should require health providers to notify patients about a data breach. CDT wants more breach notices, so it argues that information might be pieced together. The concept of “de-identification” is weak.

The second quote is from a CDT legal brief asking the Supreme Court to review (and I believe they would argue to reject) the New Hampshire law. CDT wants the data to be shared, so it argues that the data is “de-identified.”

However, as data is copied, exchanged, or recombined with other information such as payment claims to Medicare and Medicaid, it’s easy to imagine records of doctors’ prescribing practices being used to help piece together patients’ drug-taking habits and health conditions.

Is this mendacity on the part of CDT? I don’t think so. It illustratates how difficult these issues are, even for sophisticated parties. Until more intellectual groundwork is laid, information policy arguments before regulators, lawmakers, and courts will not rest on solid footing. Everyone’s trying their best!

You’re dying to know the right answers, of course: Government-mandated data breach notifications are part of a growing trend toward command-and-control data security. Giving injured parties common law remedies and letting the legal incentives sort things out would be much better. HHS, of all agencies, should not be doing data security.

The New Hampshire law weakens drug comanies’ ability to market to doctors, which deprives them of information that could help them serve patients better. The remote privacy risk to patients when doctors’ prescribing practices are shared should also be handled by common law remedies rather than the state’s regulation, with its attenuated privacy claims. Rather than the U.S. Supreme Court finding a federal trump card, the legislature in New Hampshire should correct its error and maximize the flow of information in the state’s health care system.

  • Pingback: What Is “De-Identified”? | Think Tank West

  • http://www.cdt.org/healthprivacy Deven McGraw

    Thanks for the opportunity to explain in more detail what some have identified as conflicting (or should I say schizophrenic) statements from the Center for Democracy & Technology (CDT) about the privacy risks posed by “de-identified” data. (When we say “de-identified,” we’re talking about data that is considered de-identified per the HIPAA standard.) The perceived disconnect in our rhetoric illustrates the challenges of advocating on this issue before different audiences. The challenge is especially pronounced when, as here, one context is adversarial and tends to favor positions that are strong and straightforward, while the other context is deliberative and has greater tolerance for nuance.

    CDT used stronger language in the legal brief primarily because we were in a litigation posture. Our goal was to clearly refute the notion that de-identified data (de-identified per the HIPAA standard) poses the same privacy risks to individuals as fully identified data. This was a position espoused by a number of the legislators who supported the New Hampshire statute (and hence it was part of the legislative history). The brief actually does note that there is a risk to patient privacy in the form of re-identifying patients using HIPAA de-identified data. The brief also states that HIPAA protections could be enhanced, including by strengthening prohibitions against re-identification of de-identified data. But you are correct, the quote you pulled out of our brief dwarfs these other qualifying statements.

    Equating HIPAA de-identified data with fully identifiable data significantly undermines our ability to advocate that most (or at least more) secondary uses of data be undertaken with either de-identified (or lesser identified) data that is sufficiently protected against re-identification. Using data in “less identified” form (while protecting it against re-identification) minimizes the risk to individual privacy while allowing data to be accessed for activities that can improve health care and contribute to the public good.

    The more nuanced position we took in our comments to HHS raised concerns about de-identified data being exempt from breach notification requirements. Here, the lack of strong protections against re-identification is the issue – and the exemption for de-identified data means that patients won’t be notified even if the data is subsequently re-identified. The “safe harbor” method, which is the one most commonly used to de-identify, is more than five years old, and the world has changed dramatically since that time (particularly with respect to the availability of data). “Safe harbor” needs a thorough review, so we asked for and supported the provision in HITECH that requires HHS to study this standard. This more nuanced position is also reflected in a white paper on de-identification we will publish later this month.

    So yes, this is a difficult issue. The study by HHS gives us an opportunity to engage in a public dialogue about ways to best resolve it. We hope it will be a vigorous and productive debate.

  • http://www.cdt.org/healthprivacy Deven McGraw

    Thanks for the opportunity to explain in more detail what some have identified as conflicting (or should I say schizophrenic) statements from the Center for Democracy & Technology (CDT) about the privacy risks posed by “de-identified” data. (When we say “de-identified,” we’re talking about data that is considered de-identified per the HIPAA standard.) The perceived disconnect in our rhetoric illustrates the challenges of advocating on this issue before different audiences. The challenge is especially pronounced when, as here, one context is adversarial and tends to favor positions that are strong and straightforward, while the other context is deliberative and has greater tolerance for nuance.

    CDT used stronger language in the legal brief primarily because we were in a litigation posture. Our goal was to clearly refute the notion that de-identified data (de-identified per the HIPAA standard) poses the same privacy risks to individuals as fully identified data. This was a position espoused by a number of the legislators who supported the New Hampshire statute (and hence it was part of the legislative history). The brief actually does note that there is a risk to patient privacy in the form of re-identifying patients using HIPAA de-identified data. The brief also states that HIPAA protections could be enhanced, including by strengthening prohibitions against re-identification of de-identified data. But you are correct, the quote you pulled out of our brief dwarfs these other qualifying statements.

    Equating HIPAA de-identified data with fully identifiable data significantly undermines our ability to advocate that most (or at least more) secondary uses of data be undertaken with either de-identified (or lesser identified) data that is sufficiently protected against re-identification. Using data in “less identified” form (while protecting it against re-identification) minimizes the risk to individual privacy while allowing data to be accessed for activities that can improve health care and contribute to the public good.

    The more nuanced position we took in our comments to HHS raised concerns about de-identified data being exempt from breach notification requirements. Here, the lack of strong protections against re-identification is the issue – and the exemption for de-identified data means that patients won’t be notified even if the data is subsequently re-identified. The “safe harbor” method, which is the one most commonly used to de-identify, is more than five years old, and the world has changed dramatically since that time (particularly with respect to the availability of data). “Safe harbor” needs a thorough review, so we asked for and supported the provision in HITECH that requires HHS to study this standard. This more nuanced position is also reflected in a white paper on de-identification we will publish later this month.

    So yes, this is a difficult issue. The study by HHS gives us an opportunity to engage in a public dialogue about ways to best resolve it. We hope it will be a vigorous and productive debate.

  • http://www.cdt.org/healthprivacy Deven McGraw

    Thanks for the opportunity to explain in more detail what some have identified as conflicting (or should I say schizophrenic) statements from the Center for Democracy & Technology (CDT) about the privacy risks posed by “de-identified” data. (When we say “de-identified,” we’re talking about data that is considered de-identified per the HIPAA standard.) The perceived disconnect in our rhetoric illustrates the challenges of advocating on this issue before different audiences. The challenge is especially pronounced when, as here, one context is adversarial and tends to favor positions that are strong and straightforward, while the other context is deliberative and has greater tolerance for nuance.

    CDT used stronger language in the legal brief primarily because we were in a litigation posture. Our goal was to clearly refute the notion that de-identified data (de-identified per the HIPAA standard) poses the same privacy risks to individuals as fully identified data. This was a position espoused by a number of the legislators who supported the New Hampshire statute (and hence it was part of the legislative history). The brief actually does note that there is a risk to patient privacy in the form of re-identifying patients using HIPAA de-identified data. The brief also states that HIPAA protections could be enhanced, including by strengthening prohibitions against re-identification of de-identified data. But you are correct, the quote you pulled out of our brief dwarfs these other qualifying statements.

    Equating HIPAA de-identified data with fully identifiable data significantly undermines our ability to advocate that most (or at least more) secondary uses of data be undertaken with either de-identified (or lesser identified) data that is sufficiently protected against re-identification. Using data in “less identified” form (while protecting it against re-identification) minimizes the risk to individual privacy while allowing data to be accessed for activities that can improve health care and contribute to the public good.

    The more nuanced position we took in our comments to HHS raised concerns about de-identified data being exempt from breach notification requirements. Here, the lack of strong protections against re-identification is the issue – and the exemption for de-identified data means that patients won’t be notified even if the data is subsequently re-identified. The “safe harbor” method, which is the one most commonly used to de-identify, is more than five years old, and the world has changed dramatically since that time (particularly with respect to the availability of data). “Safe harbor” needs a thorough review, so we asked for and supported the provision in HITECH that requires HHS to study this standard. This more nuanced position is also reflected in a white paper on de-identification we will publish later this month.

    So yes, this is a difficult issue. The study by HHS gives us an opportunity to engage in a public dialogue about ways to best resolve it. We hope it will be a vigorous and productive debate.

  • Pingback: How to Astral Project

  • Pingback: http://buy-acaiberryselect.com

  • Pingback: foods that burn fat

  • Pingback: Joshua Gervais

  • Pingback: topsail beach

  • Pingback: Super aanbiedingen

  • Pingback: premier league indonesia

  • Pingback: Our Review

  • Pingback: prix de l'immobilier

  • Pingback: http://www.youtube.com/watch?v=TIIzKnjOUZ4 ideo

  • Pingback: How To Make A Million Dollars In A Week

  • Pingback: Doctor Nurse Test

Previous post:

Next post: