Mo’ Data Mo’ Problems

The most exciting promise of Big Data–and if you hate that term, you’re not alone, but I think we’re stuck with it now–is this: the data collection happening on an increasingly gargantuan scale, run through modern data-processing and pattern-recognition algorithms, will unearth powerful new insights into our world and, especially, human behavior. Unfortunately this is also its most worrying problem.

Right now, Big Data and privacy seem to be mortal foes. Personal data can reduce your car insurance–at the price of privacy. It can provide valuable public health data–by capturing sensitive private health information. It can help the police track bad guys–by creating a facialrecognition panopticon with technology that is practically crying out to be abused. It can construct a meaningful narrative out of, say, all the pictures you’ve ever posted to the Internet–even if you didn’t intend that to happen.

These aren’t purely theoretical concerns. The New York Times reports:

Having eroded privacy for decades, shady, poorly regulated data miners, brokers and resellers have now taken creepy classification to a whole new level. They have created lists of victims of sexual assault, and lists of people with sexually transmitted diseases. Lists of people who have Alzheimer’s, dementia and AIDS. Lists of the impotent and the depressed.

There are lists of “impulse buyers.” Lists of suckers: gullible consumers who have shown that they are susceptible to “vulnerability-based marketing.”

Now imagine such lists augmented by people who have accidentally, implicitly shown that they are vulnerable — eg with Facebook posts that algorithms interpret, in the context of all the posters’ other information, as evidence of secrets that users don’t want to reveal.

There are two basic problems here. One is that there are no standards for anonymizing and securing data. Organizations that collect and publish data anonymize and secure it only if and how they feel like it, on an ad hoc basis, and much “anonymized” data really isn’t — consider the badly anonymized NYC taxi data from earlier this year.

But there’s a deeper, far more fundamental, issue: do people have the right to know when data about them is being collected? And when it is, should they, rather than the collectors, own that data? I give you MIT professor Alex Pentland and his proposed “New Deal on Data“:

Collectively, we now have data that could help green the environment, create transparent government, deal with pandemics, and, of course, lead to better workers and better service for customers. But obviously someone or some company can abuse that […] The New Deal would give people the ability to see what’s being collected and opt out or opt in. Imagine you had a dashboard that showed what your house knows about you and what it shares, and you could turn it off or on […] Transparency is key. The data being recorded about you will form a fairly complete picture of your life […] I don’t think companies realize that the costs of a “grab all the data” strategy are very high.

Realistically, though, this New Deal implies yet another chapter in the long, sad tale of the battle between innovation and regulation. I’m not opposed to the latter, but I am frequently frustrated by how slowly it evolves compared to the former. There’s little doubt that the exponential growth in our data-collection abilities can lead to enormous benefits–but there’s also little doubt that the population in general is already deeply concerned about technology’s inexorable (and almost accidental) war on privacy, and we’re only a disaster or few away from loud calls for stricter regulation.

It would behoove the tech industry to get ahead of this problem, to begin with, by defining and implementing technical standards for data anonymization. (I wouldn’t be surprised if this actually became a new sub-sub-industry.) Better yet, larger companies could agree on a voluntary equivalent of the New Deal on Data, in hopes of forestalling any cries for regulation. Better sooner than later. I don’t think the tech industry quite appreciates how creeped out the general public is by data privacy and the lack thereof. If we take it casually, we’re playing with fire.