A couple of Google lawyers have announced on the Google ‘blog that the company will be making the data from their server logs “much more anonymous, so that it can no longer be identified with individual users, after 18-24 months.” That’s a big, important change, as Google’s privacy policy has never before pledged to destroy or anonymize data about all of our searches.
Now, there are some interesting details – details that are highlighted by the text I quoted above. “Anonymous” is correctly regarded as an absolute condition. Like pregnancy, anonymity is either there or it’s not. Modifying the word with a relative adjective like “more” is a curious use of language.
Google has a challenge, if they’re going to anonymize data and not destroy it, to make sure that a person’s identity and behavior cannot be reconstructed from it. As AOL’s fiasco with releasing “anonymized” search data showed, clipping off the obvious identifiers won’t do it. As data mining capabilities advance, anonymizing techniques will have to keep ahead of that.
There are interesting things that can be done to synthesize data, making it statistically relevant while factually incoherent. Hopefully, Google will sic some of its finest famously-smarty-pants engineers on the task of making their anonymous data really, really anonymous.
Comments on this entry are closed.