Good Stuff from Google / The Internet is Not a Cloud!

by on August 9, 2007 · 4 comments

There are no two ways about it: Google is doing good things on privacy.

The video below provides ordinary people very important information that will empower them with the awareness they need to protect their privacy. To those of us who are technically aware, the information presented here is a little obvious, but the average Internet user doesn’t know it. They need to.

Over the long haul, this kind of education will be much more effective protection for consumers than privacy regulation – and it will have none of the costs of regulation: in wasted tax dollars, market-distorting rent-seeking and regulatory capture, etc.

The video raises some important new points and questions, of course:


Foremost – the Internet is not a cloud! It is a network of telecommunications providers, including, most importantly, ISPs. They have lots of information about where we go online – and the ability to learn much more. When will an ISP come forward with the kind of transparency and good corporate citizenship that Google is showing here?

A couple critiques of the Google presentation:

The assertion “logs don’t contain any truly personal information about you” is not necessarily true. It’s contingent on what searches you’ve done. As AOL’s recent gaffe in distributing raw search logs demonstrated, people’s searches can be used to identify them. Each individual search is not necessarily identifiable, but a group of searches often will be, and it will grow more identifiable with the development of better data mining techniques and the collection of more data in more places.

The redaction of IP addresses and cookie numbers is a step forward, but far from great anonymization. Recall my observation when this step was announced: “much more anonymous” was a curious use of language – qualifying an absolute (“anonymous”) with a relative adjective like “more.” Shortening IP addresses and pulverizing cookie numbers will make it harder to re-identify data, but not very hard in many cases. The problem is with the identifying search terms people use.

Google could do much more for anonymization by getting into synthesized data, a concept I first learned about from folks at the U.S. Census. Here, for example, is a paper that includes discussion of – love it – “inference-valid synthetic microdata.” That is information in which the relationships among bits of data are useful for the particular interests of the researcher, but not otherwise. And here is a PowerPoint presentation that conveys a surprisingly complete (for a PPT) picture of the issues. (Note that someone from Google is included in the “colleagues” page.)

Synthesized data is not a magic bullet, but it would be a big step forward. Google may have to do some of its product development research using synthesized data collected prospectively for particular projects, rather than maintaining wells of identifiable information for whatever might come up.

All this shouldn’t detract from this important, positive step from Google on privacy. As I said before, hopefully, Google will sic some of its finest famously-smarty-pants engineers on the task of making their anonymous data really, really anonymous.

  • http://lippard.blogspot.com/ Jim Lippard

    “The assertion “logs don’t contain any truly personal information about you” is not necessarily true. It’s contingent on what searches you’ve done. As AOL’s recent gaffe in distributing raw search logs demonstrated, people’s searches can be used to identify them. Each individual search is not necessarily identifiable, but a group of searches often will be, and it will grow more identifiable with the development of better data mining techniques and the collection of more data in more places.”

    And such data is at high risk of being intercepted without judicial review. The 9th Circuit ruling in U.S. v. Forrester says there’s no 4th Amendment protection for email to/from fields or web URLs, even though this is in the packet data payload, not the packet headers. Customer traffic containing such data can be intercepted at the ISP and supplied to law enforcement without a court order.

  • http://lippard.blogspot.com/ Jim Lippard

    “The assertion “logs don’t contain any truly personal information about you” is not necessarily true. It’s contingent on what searches you’ve done. As AOL’s recent gaffe in distributing raw search logs demonstrated, people’s searches can be used to identify them. Each individual search is not necessarily identifiable, but a group of searches often will be, and it will grow more identifiable with the development of better data mining techniques and the collection of more data in more places.”

    And such data is at high risk of being intercepted without judicial review. The 9th Circuit ruling in U.S. v. Forrester says there’s no 4th Amendment protection for email to/from fields or web URLs, even though this is in the packet data payload, not the packet headers. Customer traffic containing such data can be intercepted at the ISP and supplied to law enforcement without a court order.

  • Patrick McKinnon

    It seems to me that google is glossing over the implications of having a cookie by simply admitting that they collect it.

    Google is no longer just a search engine, but a service provider (email, documents, checkout, domain management), and each of these services require varying degrees of your personal information. Gmail for example requires your cell phone number (for verification), and obviously has your email address; while checkout requires your address and bank account information.

    What non-technologists might not realize, is that all of these services use the same cookie that google collects whenever you perform a search query. Therefore, this seemingly harmless cookie google mentions, can actually be used to link your search queries to all of the other personal information google knows about you whenever you perform a query on a machine where you have also been logged in to one of google’s myriad of other services. For some perspective, see http://google.com/searchhistory

    I’m not trying to imply that google has devious motives for linking all this information together; in fact, by being able to do so, they are able to provide a personally tailored service, with more relevant search results.

    I do however think that this thinly veiled disclosure on the information that google collects when you perform a search query can be misleading when viewed in two narrow of a context.

  • Patrick McKinnon

    It seems to me that google is glossing over the implications of having a cookie by simply admitting that they collect it.

    Google is no longer just a search engine, but a service provider (email, documents, checkout, domain management), and each of these services require varying degrees of your personal information. Gmail for example requires your cell phone number (for verification), and obviously has your email address; while checkout requires your address and bank account information.

    What non-technologists might not realize, is that all of these services use the same cookie that google collects whenever you perform a search query. Therefore, this seemingly harmless cookie google mentions, can actually be used to link your search queries to all of the other personal information google knows about you whenever you perform a query on a machine where you have also been logged in to one of google’s myriad of other services. For some perspective, see http://google.com/searchhistory

    I’m not trying to imply that google has devious motives for linking all this information together; in fact, by being able to do so, they are able to provide a personally tailored service, with more relevant search results.

    I do however think that this thinly veiled disclosure on the information that google collects when you perform a search query can be misleading when viewed in two narrow of a context.

Previous post:

Next post: