The National Security Agency likes to claim that intelligence officers are only collecting the phone records of millions of Americans, safely omitting their actual names from analysis. But a Stanford researcher, Jonathan Mayer, found that he and his co-author could easily match so-called “meta-data” to individual names with little more than a Google search.
“If a few academic researchers can get this far this quickly, it’s difficult to believe the NSA would have any trouble identifying the overwhelming majority of American phone numbers,” they wrote.
Using a crowdsourced public database of voluntarily submitted phone records, MetaPhone…
We randomly sampled 5,000 numbers from our crowdsourced MetaPhone dataset and queried the Yelp, Google Places, and Facebook directories. With little marginal effort and just those three sources—all free and public—we matched 1,356 (27.1%) of the numbers. Specifically, there were 378 hits (7.6%) on Yelp, 684 (13.7%) on Google Places, and 618 (12.3%) on Facebook.
What about if an organization were willing to put in some manpower?
To conservatively approximate human analysis, we randomly sampled 100 numbers from our dataset, then ran Google searches on each. In under an hour, we were able to associate an individual or a business with 60 of the 100 numbers. When we added in our three initial sources, we were up to 73.
The science of identifying people from supposedly anonymous databases has become a game for academics. Last year, a group of researchers proved that they could identify individuals from a DNA database of their relatives and public demographic information.
On the more invasive side, other researchers could estimate the sexual preferences of Facebook users from pages they “like”.
“Even if you think you’re keeping your information private, we can learn a lot about you,” said Jennifer Golbeck, a University of Maryland computer scientist who conducts research similar to the one used to identify Facebook users.
Statistically, it’s not hard to do. There are only so many short, 31-year-old Jewish writers in San Francisco (actually, come to think of it, there are probably a lot of people like me in this city). While an algorithm may not be able to identify every single person, it can dramatically narrow the search to the point where it’s easy for a determined person to find the information they need.
People may disagree about whether or not government agencies should have private information, but let’s not pretend they can’t learn anything they want from what information they have.