OSTWG, Child Protection, Privacy & Data Retention Mandates v. “Behavioral” Advertising

by on February 4, 2010 · 5 comments

Today’s Online Safety Technical Working Group (OSTWG) meeting included some heated debate about whether online intermediaries should be doing more to assist law enforcement to help track down child predators and those producing and distributing child pornography. (It’s not clear whether or when NTIA will actually put the archived video or a transcript online at this point).

Most interesting was the third panel of the day (agenda), which devolved into a shouting match as Dr. Frank Kardasz (resume) of the Arizona Internet Crimes Against Children (ICAC) Task Force basically accused Internet intermediaries of being willing accomplices in crimes of sexual abuse against children—and suggested that they could be charged as co-defendants in child porn prosecutions. A few industry folks in the room expressed their outrage at such slander. A retired law enforcement officer perhaps put it best when he said that he had never dealt with an ISP that didn’t sincerely want to help law enforcement stop this monstrous crime.

Apart from those pyrotechnics, and a superb morning presentation by the Pew Internet Project’s Amanda Lenhart about “Social Media & Young Adults,” the most interesting part of the day concerned data retention mandates. Even as a debate rages in Washington about how much collection and use of online data should be permitted, Dr. Kardasz suggested online service providers should be required to hold user data for 5 years. A number of attendees noted the staggering costs of such a mandate given the sheer volume of information shared every day by use, especially for startups for whom building monitoring and compliance infrastructure can be a significant barrier to entry. Of course, practical objections are always answered with practical counter-solutions—in this case, several attendees asked why we couldn’t just provide tax incentives or stimulus money to defray such costs. One attendee joked that we’d have to devote the entire state of Montana just to house all the necessary server farms.

But the strongest objection came from John Morris of the Center for Democracy & Technology, who rightly noted that no amount of government subsidies for data retention could prevent leakage of sensitive private data. For this reason and because of the basic civil liberties at stake whenever the government has access to large pools of data about its citizens, Morris argued that we need to strike a balance between how we protect children & the values of free society. Dave McClure of the US Internet Industry Association (USIIA) seconded this point powerfully: If such vast data is retained, it will be abused.

Then the riposte from advocates of data retention mandates: Aren’t online intermediaries already retaining huge amounts of consumer information? If they can do that, why can’t they retain the data we need to track down child predators and child porn distributors?

John Morris and the ACLU’s Chris Calabrese patiently explained just how different these two kinds of data retention really are. Advertisers don’t care who you are—just what you’re likely to be interested in. So it simply isn’t worth the cost for them to retain the massive logs of data tracking every site a user has been to and when, or even tying that information to an IP address. All the advertiser wants is to be able to correlate information about likely interests with a cookie that uniquely identifies a computer (which likely, but not necessarily, corresponds to a user). I couldn’t have explained this difference better myself!

They didn’t specifically get into this example, but even a company like Phorm, which offers behavioral advertising based on inspecting packets sent back and forth by an Internet user doesn’t actually retain the kind of “digital dossier” of a user’s browsing activity that some advocates of increased data regulation fear–or that law enforcement wants. Instead, Phorm examines certain kinds of pages visited by users (e.g., no HTTPS or email) and looks for keywords (excluding sensitive things like phone numbers, social security numbers and credit card numbers) that suggest the user might be interested in a particular marketing category. The data about where the user has visited is then discarded, leaving only the marketing categories matched to that user’s unique ID (e.g., dog-owner, fly-fisher).

So even when it comes to the much-feared “Deep Packet Inspection,”what advertisers want is profoundly different from the kind of data retention mandates proposed by Kardasz and others in law enforcement. Moreover, given the costs entailed in data storage and processing, the mere fact that something is theoretically possible doesn’t mean advertisers are willing to pay for it just to try to tell you about their product! That critical point has been missing from most of the ongoing conversation about regulating “targeted” advertising, which tend to focus on the theoretical possibility of a particular data collection/use/aggregation practice rather than whether it’s actually being done or even whether it would make economic sense to do so. So I’m glad to see John Morris and Chris Calabrese making these vital points.

I don’t mean to pull a “gotcha!” on them as representatives of two organizations that have also been outspoken in calling for restrictions on the private use of data (especially since I can’t do justice them by quoting them precisely here without a transcript of the event or the ability to go back and listen to this fascinating exchange again). I’m sure they would respond that the potential for abuse still exists when private companies collect data about users for advertising purposes: Some companies might collect so much data that it could be tied back to a particular user and cause actual harm if released, which is always a possibility. That would be a fair response, but it would at least place us in a constructive debate between reasonable people about the costs and benefits of data sharing and whether government regulation is really the best way to address privacy concerns.

The important point is that they recognize the difference in kind between the collection of limited amounts of data for advertising purposes and the kind of comprehensive data mandates proposed by Kardasz and others. If nothing else, that difference means that one can take a principled stance—as I do—against data retention mandates as a governmental invasion of our privacy but also in favor of reliance on user empowerment, education, targeted enforcement of existing laws, etc. as less restrictive alternatives to government regulation of private data use, just as with parental control and empowerment over parentalist censorship.  As Adam Thierer and I have argued, because there are significant costs to regulation for consumers, free speech and culture, any government mandates should be narrowly tailored to addressing real, demonstrable harms rather than vague, unsubstantiated fears or amorphous concepts like “dignity interests.”

The other critical part of our “layered approach” to privacy concerns is building a higher “Wall of Separation Between Web and State.” Concretely, that means opposing such onerous data retention mandates and reforming ECPA—a subject mentioned only at the end of today’s meeting. In the comments I filed recently on the Notice written by CDT for the FCC, I praised CDT’s work in this area and look forward to working with them (and the ACLU and groups like EFF) on that cause in the future, despite our differences on private data use regulation.

  • Pingback: Big Brother Is Watching Me Type This Post Title « Around The Sphere

  • Guest

    While many great points are made here, it is worth noting that there are 2 functions sought by online advertisers: targeting and reporting. While much discussion has been given to targeting's use and collection of data, relatively little discussion had been given to reporting, which is unfortunately the larger and more personal of the data uses.
    While it is correct that from a targeting perspective identity, indeed individuality, is not required, from a reporting perspective this is often not the case. Reporting usually requires either attribution to a real indentified individual (CPA, Affiliate, ROI) or to a unique identifier (usually IP) which allows certain indentities to be removed. This of course raises the interesting question of the difference between knowing who someone is and the ability to prove who someone is not (a question at the heart of the click-fraud debate). While targeting seeks merely to broadly categorize, reporting often seeks to know who engaged in the activity either to attribute conversion or to prove that the activity belonged to a legitimate actor (not click fraud, not a robot, etc.).
    Also it must be understood that each ad we see on the web generally has multiple data collectors associated with it with different reporting needs, and while relatively few data collectors are involved in ad targeting, each collector will generally log data like cookies and IP address and may store this for indeterminate amounts of time for reporting purposes.
    Terabyte hard drives are now well under $100, making the cost of retaining event level reporting data negligible. From an online privacy perspective we should be thinking in terms of providing consumers with information about both targeting and reporting to allow them to make informed decisions about how they share their data and the services they get in return for this data sharing.

  • http://techliberation.com/author/berinszoka/ Berin Szoka

    Well said.

  • Guest

    While many great points are made here, it is worth noting that there are 2 functions sought by online advertisers: targeting and reporting. While much discussion has been given to targeting's use and collection of data, relatively little discussion had been given to reporting, which is unfortunately the larger and more personal of the data uses.
    While it is correct that from a targeting perspective identity, indeed individuality, is not required, from a reporting perspective this is often not the case. Reporting usually requires either attribution to a real indentified individual (CPA, Affiliate, ROI) or to a unique identifier (usually IP) which allows certain indentities to be removed. This of course raises the interesting question of the difference between knowing who someone is and the ability to prove who someone is not (a question at the heart of the click-fraud debate). While targeting seeks merely to broadly categorize, reporting often seeks to know who engaged in the activity either to attribute conversion or to prove that the activity belonged to a legitimate actor (not click fraud, not a robot, etc.).
    Also it must be understood that each ad we see on the web generally has multiple data collectors associated with it with different reporting needs, and while relatively few data collectors are involved in ad targeting, each collector will generally log data like cookies and IP address and may store this for indeterminate amounts of time for reporting purposes.
    Terabyte hard drives are now well under $100, making the cost of retaining event level reporting data negligible. From an online privacy perspective we should be thinking in terms of providing consumers with information about both targeting and reporting to allow them to make informed decisions about how they share their data and the services they get in return for this data sharing.

  • http://techliberation.com/author/berinszoka/ Berin Szoka

    Well said.

Previous post:

Next post: