Good Stuff from Google / The Internet is Not a Cloud!

by on August 9, 2007 · 4 comments

There are no two ways about it: Google is doing good things on privacy.

The video below provides ordinary people very important information that will empower them with the awareness they need to protect their privacy. To those of us who are technically aware, the information presented here is a little obvious, but the average Internet user doesn’t know it. They need to.

Over the long haul, this kind of education will be much more effective protection for consumers than privacy regulation – and it will have none of the costs of regulation: in wasted tax dollars, market-distorting rent-seeking and regulatory capture, etc.

The video raises some important new points and questions, of course:

Foremost – the Internet is not a cloud! It is a network of telecommunications providers, including, most importantly, ISPs. They have lots of information about where we go online – and the ability to learn much more. When will an ISP come forward with the kind of transparency and good corporate citizenship that Google is showing here?

A couple critiques of the Google presentation:

The assertion “logs don’t contain any truly personal information about you” is not necessarily true. It’s contingent on what searches you’ve done. As AOL’s recent gaffe in distributing raw search logs demonstrated, people’s searches can be used to identify them. Each individual search is not necessarily identifiable, but a group of searches often will be, and it will grow more identifiable with the development of better data mining techniques and the collection of more data in more places.

The redaction of IP addresses and cookie numbers is a step forward, but far from great anonymization. Recall my observation when this step was announced: “much more anonymous” was a curious use of language – qualifying an absolute (“anonymous”) with a relative adjective like “more.” Shortening IP addresses and pulverizing cookie numbers will make it harder to re-identify data, but not very hard in many cases. The problem is with the identifying search terms people use.

Google could do much more for anonymization by getting into synthesized data, a concept I first learned about from folks at the U.S. Census. Here, for example, is a paper that includes discussion of – love it – “inference-valid synthetic microdata.” That is information in which the relationships among bits of data are useful for the particular interests of the researcher, but not otherwise. And here is a PowerPoint presentation that conveys a surprisingly complete (for a PPT) picture of the issues. (Note that someone from Google is included in the “colleagues” page.)

Synthesized data is not a magic bullet, but it would be a big step forward. Google may have to do some of its product development research using synthesized data collected prospectively for particular projects, rather than maintaining wells of identifiable information for whatever might come up.

All this shouldn’t detract from this important, positive step from Google on privacy. As I said before, hopefully, Google will sic some of its finest famously-smarty-pants engineers on the task of making their anonymous data really, really anonymous.

Previous post:

Next post: