Facebook's Creepy Data-Grabbing Ways Make It The Borg Of The Digital World

The latest Facebook data breach — which exposed personal contact information that Facebook had harvested on 6 million of its users — is a reminder that even if you’re not handing over all your contact data to Facebook, it is obtaining and triangulating that data anyway. And even if you’re not on Facebook yourself, your contact data likely is because the social network is building a shadow profile of you by data-mining other people.

You might never join Facebook, but a zombie you — sewn together from scattered bits of your personal data — is still sitting there in sort-of-stasis on its servers waiting to be properly animated if you do sign up for the service. Or waiting to escape through the cracks of another security flaw in Facebook’s systems.

Facebook is a crowd-fueled, data-mining machine that’s now so massive (1.11 billion monthly active users as of March 2013) that it doesn’t matter if you haven’t ever signed up yourself to sign over your personal data. It has long since passed the tipping point where it can act as a distributed data network that knows something about almost everyone. Or everyone who leaves any kind of digital/cellular trace that can be fed into its data banks.

Chances are someone you have corresponded with — by email or mobile phone — has let Facebook’s data spiders crawl through their correspondence, thereby allowing your contact data to be assimilated entirely without your knowledge or consent. One such example was flagged to TechCrunch on Saturday when one of the users was informed by Facebook they had been affected by its latest breach found it had harvested an email address they had never personally handed over.

This behaviour casts Facebook as the Borg of the digital world: resistance is futile. It also underlines exactly why the NSA wants a backdoor into this type of digital treasure trove. If you’re going to outsource low-level surveillance of everyone, then Facebook is one of a handful of tech companies large enough to have files on almost everyone. So really, forget the futuristic Borg: this ceaseless data-harvesting brings to mind the dossier-gathering attention to detail of the Stasi.

Does this matter? That depends on whether you care about privacy — your own or other people’s. Since Facebook is not immune to data leaks and security imperfections, as the latest bug illustrates (which has apparently been a puncture-hole in its systems since last year), the fact that it is harvesting and storing your data means there is an ongoing risk that data could be exposed to others without your consent. And that’s ignoring the primary lack of consent in Facebook storing your data without asking you in the first place.

Apparently it’s okay for your friends to consent to sharing your data on your behalf. Better choose your friends carefully then. Except it’s not even just your friends — it’s likely anyone you have had cause to correspond with in any capacity, friendship or otherwise. It seems unlikely Facebook’s algorithms are discerning enough to determine which contacts are friends, were once friends or have always only ever been passing/fleeting acquaintances and therefore have zero claim to be custodians of your personal data. Not that your real friends are likely aware they are acting as guardians of your data either.

Facebook says it uses the data it mines on you from others to power its friend recommendation feature. Which means the friend suggestion thumbnails that periodically crop up to help you build out your Facebook network, based on people its algorithms think you might know. This feature is helpful to Facebook, allowing it to encourage rapid growth of its users’ networks — by cutting down on the legwork required to find friends on the service — and therefore fuel overall user growth of its service. Sure, it’s also handy for individual Facebook users but is it useful enough to justify holding on to a vast mountain of personal contact data without consent?

The key issues here — beyond the overarching privacy theme — are transparency and consent. Facebook is very coy about explaining what it is doing. Do your friends even know they are consenting to your contact details being stored in Facebook’s cloud when they hook Facebook up to their contacts’ books? It’s highly unlikely they’re aware that that is what is happening. All they’re likely thinking is: ‘this feature will help me find more friends’. Facebook is certainly not going out of its way to explicitly say how its digital matchmaking service works.

You could argue that the average user won’t care or likely understand a technical explanation. But that does not excuse Facebook treating your personal data as the property of another person who may or may not care where that data ends up. It’s your data — and you are the one affected if it’s leaked. But Facebook is sidestepping that reality by being opaque about its processes and failing to acknowledge there are wider privacy implications to its data-grabbing ways (Packet Storm goes into one possible unpleasant scenario of the current Facebook data-harvesting process here).

In its blog post detailing last week’s data breach, Facebook skimmed over the surface of its processes (see quotation below). It focused, instead, on explaining why it harvests data, rather than making it clear it is storing users’ friends’ phone numbers and email addresses to do this. Why avoid spelling that out? Because it inevitably sounds creepy. Because, well, it inevitably is creepy.

When people upload their contact lists or address books to Facebook, we try to match that data with the contact information of other people on Facebook in order to generate friend recommendations. For example, we don’t want to recommend that people invite contacts to join Facebook if those contacts are already on Facebook; instead, we want to recommend that they invite those contacts to be their friends on Facebook.

Because of the bug, some of the information used to make friend recommendations and reduce the number of invitations we send was inadvertently stored in association with people’s contact information as part of their account on Facebook. As a result, if a person went to download an archive of their Facebook account through our Download Your Information (DYI) tool, they may have been provided with additional email addresses or telephone numbers for their contacts or people with whom they have some connection. This contact information was provided by other people on Facebook and was not necessarily accurate, but was inadvertently included with the contacts of the person using the DYI tool.

Note Facebook’s phrasing: “This contact information was provided by other people on Facebook”. In other words, ‘your personal contact info was shared with us — but not by you’. That’s the root issue here, and Facebook is cloaking it with anodyne language — and burying it five paragraphs into the post. Transparent? Not even close.

Of course Facebook is not the only tech giant intent on amassing data dossiers on as many Internet users as possible. Google has drawn the attention of European data protection regulators, for example, after it consolidated more than 60 individual product privacy policies into one joined up policy — allowing it to join the dots of usage of its different products to sketch more detailed profiles of those users. Mountain View’s Google+ social layer is also designed to function as a data harvester, pushing people to tie their usage of multiple Google products back to a single public profile. As the Guardian‘s Charles Arthur has argued, Google+ is not really a social network at all; it’s more like The Matrix.

But despite Google’s consolidated privacy policies drawing the attention of data protection regulators the company has not (yet) altered its data-knitting course. It remains to be seen whether the investigation by six European Union member states will force it to make changes. The possibility of fines is on the table. But when you’re dealing with a company with such massive resources as Google — and one which pours so much effort into political lobbying — it likely requires a commensurately joined up, global approach to have any hope of changing its behaviour. A handful of EU countries aren’t going to be able to turn this juggernaut around.

There is also the argument that the cat is out of the bag. That these huge data-mining operations are now so mature, extensive and well used that any kind of regulatory unpicking is futile. Not least because the quantity of data being gathered on human behaviour is only going to grow — likely becoming even more personal and intimate, with wearable devices enabling the harvesting of physical data points, too. And yet that actually sounds like a lot more weight for the argument that these huge data-harvesting operations really need proper scrutiny stat.

It has to be said that data-protection regulators have been extremely flat-footed in their response to the implications of systematic consolidation and cross-referencing of personal data. The lack of transparency about how these algorithms work has certainly helped the companies that created them to grow their user-data mountains in carefully crafted shade.

But a little more light is now being directed onto those darkened places, and onto the control-minded organisations (such as the NSA) inevitably attracted by the scale of the data-mining operations going on behind some of the shiniest consumer facades in tech town. So, even if we as personal Internet-using individuals can’t now hope to claim absolute ownership of all our data online, it’s worth asking what other kind of data-fuelled Frankensteins are lurking in the darkness — besides Facebook’s zombie army of shadow profiles.