data mining – Technology Liberation Front https://techliberation.com Keeping politicians' hands off the Net & everything else related to technology Thu, 10 Mar 2011 18:43:15 +0000 en-US hourly 1 6772528 Good News! Online Tracking is Slightly Boring https://techliberation.com/2011/03/10/good-news-online-tracking-is-slightly-boring/ https://techliberation.com/2011/03/10/good-news-online-tracking-is-slightly-boring/#respond Thu, 10 Mar 2011 16:13:17 +0000 http://techliberation.com/?p=35531

You have to wade through a lot to reach the good news at the end of Time reporter Joel Stein’s article about “data mining”—or at least data collection and use—in the online world. There’s some fog right there: what he calls “data mining” is actually ordinary one-to-one correlation of bits of information, not mining historical data to generate patterns that are predictive of present-day behavior. (See my data mining paper with Jeff Jonas to learn more.) There is some data mining in and among the online advertising industry’s use of the data consumers emit online, of course.

Next, get over Stein’s introductory language about the “vast amount of data that’s being collected both online and off by companies in stealth.” That’s some kind of stealth if a reporter can write a thorough and informative article in Time magazine about it. Does the moon rise “in stealth” if you haven’t gone outside at night and looked at the sky? Perhaps so.

Now take a hard swallow as you read about Senator John Kerry’s (D-Mass.) plans for government regulation of the information economy.

Kerry is about to introduce a bill that would require companies to make sure all the stuff they know about you is secured from hackers and to let you inspect everything they have on you, correct any mistakes and opt out of being tracked. He is doing this because, he argues, “There’s no code of conduct. There’s no standard. There’s nothing that safeguards privacy and establishes rules of the road.”

Securing data from hackers and letting people correct mistakes in data about them are kind of equally opposite things. If you’re going to make data about people available to them, you’re going to create opportunities for other people—it won’t even take hacking skills, really—to impersonate them, gather private data, and scramble data sets.

If Senator Kerry’s argument for government regulation is that there aren’t yet “rules of the road” pointing us off that cliff, I’ll take market regulation. Drivers like you and me are constantly and spontaneously writing the rules through our actions and inactions, clicks and non-clicks, purchases and non-purchases.

There are other quibbles. “Your political donations, home value and address have always been public,” says Stein, “but you used to have to actually go to all these different places — courthouses, libraries, property-tax assessors’ offices — and request documents.”

This is correct insofar as it describes the modern decline in practical obscurity. But your political donations were not public records before the passage of the Federal Election Campaign Act in 1974. That’s when the federal government started subordinating this particular dimension of your privacy to others’ collective values.

But these pesky details can be put aside. The nuggets of wisdom in the article predominate!

“Since targeted ads are so much more effective than nontargeted ones,” Stein writes, “websites can charge much more for them. This is why — compared with the old banners and pop-ups — online ads have become smaller and less invasive, and why websites have been able to provide better content and still be free.”

The Internet is a richer, more congenial place because of ads targeted for relevance.

And the conclusion of the article is a dose of smart, well-placed optimism that contrasts with Senator Kerry’s sloppy FUD.

We’re quickly figuring out how to navigate our trail of data — don’t say anything private on a Facebook wall, keep your secrets out of e-mail, use cash for illicit purchases. The vast majority of it, though, is worthless to us and a pretty good exchange for frequent-flier miles, better search results, a fast system to qualify for credit, finding out if our babysitter has a criminal record and ads we find more useful than annoying. Especially because no human being ever reads your files. As I learned by trying to find out all my data, we’re not all that interesting.

Consumers are learning how to navigate the online environment. They are not menaced or harmed by online tracking. Indeed, commercial tracking is congenial and slightly boring. That’s good news that you rarely hear from media or politicians because good news doesn’t generally sell magazines or legislation.

]]>
https://techliberation.com/2011/03/10/good-news-online-tracking-is-slightly-boring/feed/ 0 35531
Crime-Prediction Software? https://techliberation.com/2010/08/24/crime-prediction-software/ https://techliberation.com/2010/08/24/crime-prediction-software/#respond Tue, 24 Aug 2010 17:54:50 +0000 http://techliberation.com/?p=31239

It sounds a little bit like the “pre-crime” unit featured in the 2002 film “Minority Report,” but news that Washington, D.C. will implement software to “predict” crime is not quite as worrisome as it might seem at first blush.

Beginning several years ago, the researchers assembled a dataset of more than 60,000 various crimes, including homicides. Using an algorithm they developed, they found a subset of people much more likely to commit homicide when paroled or probated. Instead of finding one murderer in 100, the UPenn researchers could identify eight future murderers out of 100. Berk’s software examines roughly two dozen variables, from criminal record to geographic location. The type of crime, and more importantly, the age at which that crime was committed, were two of the most predictive variables.

Unlike applying data mining to detection of terrorism planning or preparation, which is exceedingly rare, using tens of thousands of examples of recidivism to discover predictive factors is a good way to focus supervision resources where they are most likely to be effective.

The article describes use of this software for monitoring parolees and probationers. Using data mining to justify anything approaching extra punishment would be a misuse, and many far more difficult issues would arise if it were used on the general population.

]]>
https://techliberation.com/2010/08/24/crime-prediction-software/feed/ 0 31239
Obama Administration Data Mining Social Networks: Privacy Threat or Overblown Hyperbole? https://techliberation.com/2009/09/02/obama-administration-data-mining-social-networks-privacy-threat-or-overblown-hyperbole/ https://techliberation.com/2009/09/02/obama-administration-data-mining-social-networks-privacy-threat-or-overblown-hyperbole/#comments Thu, 03 Sep 2009 03:13:28 +0000 http://techliberation.com/?p=20913

A number of conservative blogs have picked up on reports that the Obama administration is looking to data mine users on social networking sites. Reports CNS News:flag_at_whitehouse_gov

Anyone who posts comments on the White House’s Facebook, MySpace, YouTube and Twitter pages will have their statements captured and permanently archived by the federal government, according to a plan that the White House is now seeking a contractor to carry out.

Whenever government is collecting information about private citizens, we should be concerned. But this controversy smells a lot like privacy fear-mongering, even though it involves government. If you post a comment to an “official” Obama administration page on a social networking site, it seems only natural that it’s fair game for data mining. The same goes if you post a video response on a publicly accessible site.

If you’re posting controversial statements online under your real name for the public to see, what do you expect will happen? Anybody in the world who has an Internet connection can log your postings, so why shouldn’t government officials be able to do the same? Until government starts pressuring Facebook or Myspace to hand over data that’s being collected on an involuntary basis, I don’t see a whole lot here to worry about.

This controversy, and the flap over flag@whitehouse.gov from a few weeks back, raise another interesting question: should Congress reexamine the Presidential Records Act (PRA) of 1978? This is the law that governs Presidential record-keeping. According to some commentators, if the administration solicits data on its critics, it is obligated under the PRA to retain that data indefinitely. I haven’t read the law, but at first glance it appears that it may have some serious deficiencies. This is is hardly surprising, of course, given that the Internet — let alone social networks — didn’t even exist when the PRA was enacted in 1978.

]]>
https://techliberation.com/2009/09/02/obama-administration-data-mining-social-networks-privacy-threat-or-overblown-hyperbole/feed/ 29 20913
600 Billion Data Points Per Day? It’s Time to Restore the Fourth Amendment https://techliberation.com/2009/08/17/600-billion-data-points-per-day-it%e2%80%99s-time-to-restore-the-fourth-amendment/ https://techliberation.com/2009/08/17/600-billion-data-points-per-day-it%e2%80%99s-time-to-restore-the-fourth-amendment/#comments Mon, 17 Aug 2009 19:04:14 +0000 http://techliberation.com/?p=20445

Jeff Jonas has published an important post: “Your Movements Speak for Themselves: Space-Time Travel Data is Analytic Super-Food!”

More than you probably realize, your mobile device is a digital sensor, creating records of your whereabouts and movements:

Mobile devices in America are generating something like 600 billion geo-spatially tagged transactions per day. Every call, text message, email and data transfer handled by your mobile device creates a transaction with your space-time coordinate (to roughly 60 meters accuracy if there are three cell towers in range), whether you have GPS or not. Got a Blackberry? Every few minutes, it sends a heartbeat, creating a transaction whether you are using the phone or not. If the device is GPS-enabled and you’re using a location-based service your location is accurate to somewhere between 10 and 30 meters. Using Wi-Fi? It is accurate below 10 meters.

The process of deploying this data to markedly improve our lives is underway. A friend of Jonas’ says that space-time travel data used to reveal traffic tie-ups shaves two to four hours off his commute each week. When it is put to full use, “the world we live in will fundamentally change. Organizations and citizens alike will operate with substantially more efficiency. There will be less carbon emissions, increased longevity, and fewer deaths.”

This progress is not without cost:

A government not so keen on free speech could use such data to see a crowd converging towards a protest site and respond before the swarm takes form — detected and preempted, this protest never happens. Or worse, it could be used to understand and then undermine any political opponent.

Very few want government to be able to use this data as Jonas describes, and not everybody wants to participate in the information economy quite so robustly. But the public can’t protect itself against what it can’t see. So Jonas invites holders of space-time data to reveal it:

[O]ne way to enlighten the consumer would involve holders of space-time-travel data [permitting] an owner of a mobile device the ability to also see what they can see:

(a) The top 10 places you spend the most time (e.g., 1. a home address, 2. a work address, 3. a secondary work facility address, 4. your kids school address, 5. your gym address, and so on);

(b) The top three most predictable places you will be at a specific time when on the move (e.g., Vegas on the 215 freeway passing the Rainbow exit on Thursdays 6:07 – 6:21pm — 57% of the time);

(c) The first name and first letter of the last name of the top 20 people that you regularly meet-up with (turns out to be wife, kids, best friends, and co-workers – and hopefully in that order!)

(d) The best three predictions of where you will be for more than one hour (in one place) over the next month, not counting home or work.

Google’s Android and Latitude products are candidates to take the lead, he says, and I agree. Google collectively understands both openness and privacy, and it’s nimble enough still to execute something like this. Other mobile providers would be forced to follow this innovation.

What should we do to reap the benefits while minimizing the costs? The starting point is you: It is your responsibility to deal with your mobile provider as an adult. Have you read your contract? Have you asked them whether they collect this data, how long they keep it, whether they share it, and under what terms?

Think about how you can obscure yourself. Put your phone in airplane mode when you are going someplace unusual – or someplace usual. (You might find that taking a break from being connected opens new vistas in front of your eyes.) Trade phones with others from time to time. There are probably hacks on mobile phone system that could allow people to protect themselves to some degree.

Privacy self-help is important, but obviously it can be costly. And you shouldn’t have to obscure yourself from your mobile communications provider, giving up the benefits of connected living, to maintain your privacy from government.

The emergence of space-time travel data begs for restoration of Fourth Amendment protections in communications data. In my American University Law Review article, “Reforming Fourth Amendment Privacy Doctrine,” I described the sorry state of the Fourth Amendment as to modern communications.

The “reasonable expectation of privacy” doctrine that arose out of the Supreme Court’s 1967 Katz decision is wrong—it isn’t even founded in the majority holding of the case. The “third-party doctrine,” following Katz in a pair of early 1970s Bank Secrecy Act cases, denies individuals Fourth Amendment claims on information held by service providers. Smith v. Maryland brought it home to communications in 1979, holding that people do not have a “reasonable expectation of privacy” in the telephone numbers they dial. (Nevermind that they actually have privacy—the doctrine trumps it.)

Concluding, apropos of Jonas’ post, I wrote:

These holdings were never right, but they grow more wrong with each step forward in modern, connected living. Incredibly deep reservoirs of information are constantly collected by third-party service providers today.

Cellular telephone networks pinpoint customers’ locations throughout the day through the movement of their phones. Internet service providers maintain copies of huge swaths of the information that crosses their networks, tied to customer identifiers. Search engines maintain logs of searches that can be correlated to specific computers and usually the individuals that use them. Payment systems record each instance of commerce, and the time and place it occurred.

The totality of these records are very, very revealing of people’s lives. They are a window onto each individual’s spiritual nature, feelings, and intellect. They reflect each American’s beliefs, thoughts, emotions, and sensations. They ought to be protected, as they are the modern iteration of our “papers and effects.”

]]>
https://techliberation.com/2009/08/17/600-billion-data-points-per-day-it%e2%80%99s-time-to-restore-the-fourth-amendment/feed/ 17 20445
Government Data Mining: The Need for a Legal Framework https://techliberation.com/2008/11/03/government-data-mining-the-need-for-a-legal-framework/ https://techliberation.com/2008/11/03/government-data-mining-the-need-for-a-legal-framework/#comments Mon, 03 Nov 2008 22:03:49 +0000 http://techliberation.com/?p=13800

Indiana University law professor Fred Cate writes with characteristic thoroughness and organization in his article Government Data Mining: The Need for a Legal Framework, published in the Harvard Civil Rights-Civil Liberties Law Review this summer.

It took me a while to get around to reading it – a little longer to write it up. Don’t make the same mistakes I did! It’s good!

Here’s a snippet from the abstract:

The article describes the extraordinary volume and variety of personal data to which the government has routine access, directly and through industry, and examines the absence of any meaningful limits on that access. So-called privacy statutes are often so outdated and inadequate that they fail to limit the government’s access to our most personal data, or they have been amended in the post-9/11 world to reduce those limits. And the Fourth Amendment, the primary constitutional guarantee of individual privacy, has been interpreted by the Supreme Court to not apply to routine data collection, accessing data from third parties, or sharing data, even if illegally gathered.

Professor Cate spends a good deal of time on the Supreme Court’s pernicious “third party doctrine,” which exempts information shared with a third party (think of ISPs, banks, etc.) from Fourth Amendment protection. This rule was bad when it was written and it grows worse and worse as we move our lives further and further online.

Oh, there are details from the paper I would have treated differently. He mistakenly says the 9/11 terrorists used false ID. (Fraudulently gotten, yes. False identities, no.) And he omits the Federal Agency Data Mining Reporting Act of 2007, passed as §804 of the Implementing Recommendations of the 9/11 Commission Act of 2007 (Public Law 110-53). But these are trivial issues with a paper that is excellent overall.

Poking around among the Internets to confirm this and that detail, I found this post saying that Professor Cate authored much of a recent report called “Protecting Individual Privacy in the Struggle Against Terrorists.” It’s also very good stuff.

Fred Cate, people!

One of the bright lights.

]]>
https://techliberation.com/2008/11/03/government-data-mining-the-need-for-a-legal-framework/feed/ 8 13800