Posts tagged as:

You have to wade through a lot to reach the good news at the end of Time reporter Joel Stein’s article about “data mining”—or at least data collection and use—in the online world. There’s some fog right there: what he calls “data mining” is actually ordinary one-to-one correlation of bits of information, not mining historical data to generate patterns that are predictive of present-day behavior. (See my data mining paper with Jeff Jonas to learn more.) There is some data mining in and among the online advertising industry’s use of the data consumers emit online, of course.

Next, get over Stein’s introductory language about the “vast amount of data that’s being collected both online and off by companies in stealth.” That’s some kind of stealth if a reporter can write a thorough and informative article in Time magazine about it. Does the moon rise “in stealth” if you haven’t gone outside at night and looked at the sky? Perhaps so.

Now take a hard swallow as you read about Senator John Kerry’s (D-Mass.) plans for government regulation of the information economy.

Kerry is about to introduce a bill that would require companies to make sure all the stuff they know about you is secured from hackers and to let you inspect everything they have on you, correct any mistakes and opt out of being tracked. He is doing this because, he argues, “There’s no code of conduct. There’s no standard. There’s nothing that safeguards privacy and establishes rules of the road.”

Securing data from hackers and letting people correct mistakes in data about them are kind of equally opposite things. If you’re going to make data about people available to them, you’re going to create opportunities for other people—it won’t even take hacking skills, really—to impersonate them, gather private data, and scramble data sets. Continue reading →

It sounds a little bit like the “pre-crime” unit featured in the 2002 film “Minority Report,” but news that Washington, D.C. will implement software to “predict” crime is not quite as worrisome as it might seem at first blush.

Beginning several years ago, the researchers assembled a dataset of more than 60,000 various crimes, including homicides. Using an algorithm they developed, they found a subset of people much more likely to commit homicide when paroled or probated. Instead of finding one murderer in 100, the UPenn researchers could identify eight future murderers out of 100. Berk’s software examines roughly two dozen variables, from criminal record to geographic location. The type of crime, and more importantly, the age at which that crime was committed, were two of the most predictive variables.

Unlike applying data mining to detection of terrorism planning or preparation, which is exceedingly rare, using tens of thousands of examples of recidivism to discover predictive factors is a good way to focus supervision resources where they are most likely to be effective.

The article describes use of this software for monitoring parolees and probationers. Using data mining to justify anything approaching extra punishment would be a misuse, and many far more difficult issues would arise if it were used on the general population.

A number of conservative blogs have picked up on reports that the Obama administration is looking to data mine users on social networking sites. Reports CNS News:flag_at_whitehouse_gov

Anyone who posts comments on the White House’s Facebook, MySpace, YouTube and Twitter pages will have their statements captured and permanently archived by the federal government, according to a plan that the White House is now seeking a contractor to carry out.

Whenever government is collecting information about private citizens, we should be concerned. But this controversy smells a lot like privacy fear-mongering, even though it involves government. If you post a comment to an “official” Obama administration page on a social networking site, it seems only natural that it’s fair game for data mining. The same goes if you post a video response on a publicly accessible site.

If you’re posting controversial statements online under your real name for the public to see, what do you expect will happen? Anybody in the world who has an Internet connection can log your postings, so why shouldn’t government officials be able to do the same? Until government starts pressuring Facebook or Myspace to hand over data that’s being collected on an involuntary basis, I don’t see a whole lot here to worry about.

This controversy, and the flap over flag@whitehouse.gov from a few weeks back, raise another interesting question: should Congress reexamine the Presidential Records Act (PRA) of 1978? This is the law that governs Presidential record-keeping. According to some commentators, if the administration solicits data on its critics, it is obligated under the PRA to retain that data indefinitely. I haven’t read the law, but at first glance it appears that it may have some serious deficiencies. This is is hardly surprising, of course, given that the Internet — let alone social networks — didn’t even exist when the PRA was enacted in 1978.

Jeff Jonas has published an important post: “Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!”

More than you probably realize, your mobile device is a digital sensor, creating records of your whereabouts and movements:

Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not. If the device is GPS-enabled and you’re using a location-based service your location is accurate to somewhere between 10 and 30 meters. Using Wi-Fi? It is accurate below 10 meters.

The process of deploying this data to markedly improve our lives is underway. A friend of Jonas’ says that space-time travel data used to reveal traffic tie-ups shaves two to four hours off his commute each week. When it is put to full use, “the world we live in will fundamentally change. Organizations and citizens alike will operate with substantially more efficiency. There will be less carbon emissions, increased longevity, and fewer deaths.”

This progress is not without cost:

Continue reading →

Indiana University law professor Fred Cate writes with characteristic thoroughness and organization in his article Government Data Mining: The Need for a Legal Framework, published in the Harvard Civil Rights-Civil Liberties Law Review this summer.

It took me a while to get around to reading it – a little longer to write it up. Don’t make the same mistakes I did! It’s good!

Here’s a snippet from the abstract:

The article describes the extraordinary volume and variety of personal data to which the government has routine access, directly and through industry, and examines the absence of any meaningful limits on that access. So-called privacy statutes are often so outdated and inadequate that they fail to limit the government’s access to our most personal data, or they have been amended in the post-9/11 world to reduce those limits. And the Fourth Amendment, the primary constitutional guarantee of individual privacy, has been interpreted by the Supreme Court to not apply to routine data collection, accessing data from third parties, or sharing data, even if illegally gathered.

Professor Cate spends a good deal of time on the Supreme Court’s pernicious “third party doctrine,” which exempts information shared with a third party (think of ISPs, banks, etc.) from Fourth Amendment protection. This rule was bad when it was written and it grows worse and worse as we move our lives further and further online.

Oh, there are details from the paper I would have treated differently. He mistakenly says the 9/11 terrorists used false ID. (Fraudulently gotten, yes. False identities, no.) And he omits the Federal Agency Data Mining Reporting Act of 2007, passed as §804 of the Implementing Recommendations of the 9/11 Commission Act of 2007 (Public Law 110-53). But these are trivial issues with a paper that is excellent overall.

Poking around among the Internets to confirm this and that detail, I found this post saying that Professor Cate authored much of a recent report called “Protecting Individual Privacy in the Struggle Against Terrorists.” It’s also very good stuff.

Fred Cate, people!

One of the bright lights.