Privacy Theater: Why Social Networks Only Pretend To Protect You

Editor’s note: The following guest post was written by Rohit Khare, the co-founder of Angstro. Building his latest project, social address book, gives him a deep familiarity with the privacy policies of all the major social networks.

I’d be wishing everyone a happier New Year if it were easier to mail out greeting cards to friends on Facebook and colleagues on LinkedIn. I’d like to use, our free, real-time social address book, but their ‘privacy’ policies prevent us from downloading contact information, even for my own friends.

At least those Terms of Service (ToS) that force us to copy addresses and phone numbers one-by-one also prevent scoundrels from stealing our identity; reselling our friends to marketers; and linking our life online to the real world. Right?

Wrong. When RockYou can stash 32 million passwords in the clear; when RapLeaf can index 600 million email accounts; and when Intelius can go public by buying 100 million profile pages; then our social networks have traded away our privacy for mere “privacy theater.”

With apologies to Bruce Schneier’s brilliant coinage, “security theater” (e.g. the magical thinking behind forcing passengers to sit down and shut up for the last hour of international flights), social networks have been dogged by one disaster after another in 2009 because they pursue policies that provide the “feeling of improved privacy while doing little or nothing to actually improve privacy.”

As long as the same information that social networks piously prohibit their own customers from using is being bought and sold on the open market by giant marketing companies, social networks are only pretending protect your privacy.

Industrial-Scale Identity Theft

Last week’s headlines brought news that RockYou had accumulated 32,603,388 identities over the past few years — and negligently stored them in plaintext in an incompetently protected database.

RockYou’s official bluster about “illegal intrusion” should fool no one: blaming Imperva, the firm who exposed the flaw, or accusing the hacker(s) of being the identity thieves is misdirection: it was actually RockYou who stole those credentials, and RockYou should be held to account.

I realize that I’m using the incendiary terms “identity theft” and “stole,” even though I would agree that users voluntarily consented to type their passwords into RockYou’s forms. I assume that both users and RockYou’s developers actually only intended to share some particular bits of information: a contact list, a user photo, a friend’s gender; but the bottom line is that instead of sharing that specific data, RockYou retained enough secrets to impersonate those users at will.

  • Don’t blame the victims. Bemoaning the absence of open standards for users to share their own data; or complaining about the weaknesses of users’ password choices is merely changing the subject.
  • Don’t blame “security” technology. More encryption, better encryption, or stronger firewalls would not help, since the default RockYou username in this case was a user’s primary email address. For anyone who chose to use a popular Webmail service, that granted access to every other online service they’ve ever used — because of those ubiquitous “Forgot your password?” buttons to email it back to you (just ask Twitter how much fun that is).
  • Don’t blame RockYou’s partners, who hosted their widgets. They just wanted to give their users some fancy new slideshows and scoreboards and other features to put on their pages; that shouldn’t have required an all-out war for viral growth that demanded users to log in and advertise their new widgets to all of their friends.

The fault, dear Reader, is not in our stars; it lies with sites that pretend to waive all care and duty by idly warning their users not to share their account passwords with anyone else.

In the absence of vigorous enforcement of those ToS agreements, any RockYou developer who passed up the opportunity to, say, phish MySpace passwords was putting their own employer at a disadvantage to any other startup that was willing to race them to the bottom.

APIs: Automating Privacy Intrusions?

RockYou minimized the scope of this breach by maintaining that it only affected their “legacy platform” for widgets rather than its larger “partner applications platforms” that use “industry standard security protocols.” After all, the advent of social networks’ partner APIs was supposed to make impersonation and scraping obsolete.

Those APIs came with their own new ToS agreements that added new, overlapping, and sometimes-contradictory restrictions as they worked through all of the implications of letting third parties in on the fun. The ACLU released a fun quiz that makes quite clear how much information is at stake, from your hometown to your friends’ sexual orientation.

For example, if you upload a photo of me that I find embarrassing, I could prevent you from tagging me in it, but I can’t forbid you from keeping your own photo online (or keeping it private, bugs aside). I can’t even forbid another friend of ours from caching a copy in his or her browser.

However, the Facebook API ToS can (and does) prevent a third-party application from caching a link to the photo for more than a day (a week on Orkut). Unfortunately, direct links to the photo server didn’t double-check the privacy policy, so a third-party app would be at risk of leaking images users thought were private, unless the developer remembered to make a separate API call every time to re-verify every photo on a page.

He (or She) Who Must Not Be Named

In an ideal world, a third party developer shouldn’t have to store any personally-identifiable information (PII). In many jurisdictions, PII is akin to toxic waste, because of the regulatory burdens and civil, even criminal, liability for acquiring and disposing of it.

Here again, Facebook is the pacesetter: it’s possible to display “She liked 7 photos uploaded by Mr. Smith two weeks ago” using little more than a numeric user id. The developer writes a sentence in Facebook Markup Language (FBML), and Facebook’s servers will dynamically substitute the name, gender, item count, and ensure grammatical agreement of pronouns, singular/plural choices, and time intervals.

OpenSocial gadgets have to copy PII into the browser to format a sentence like that. LinkedIn’s partners even have to copy PII to their own servers, since their Open API is currently incompatible with AJAX authentication.

Even though copying PII is the root of all privacy risks, there are three reasons it can be necessary: latency, history, and agility. Without caches, slow API calls can make an app’s performance suffer. Without archives, analyzing only the most recent events can mislead an app’s trend detection or recommendation services. Without “offline” access, waiting for a user to log in again delays an app’s reaction to events in real-time.

There aren’t many technical countermeasures once data has been copied. LinkedIn spent more than a year tinkering with their public API, but the only substantial difference is that it now encrypts every member id with the identity of the developer and application to trace the source of a breach. I applaud them as an industry pioneer — though they’re so dependent on search-engine optimization that they still include the public numeric ids in the profile page URLs anyway.

Exporting PII with legal strings attached is the best policy we can hope for. While Amazon’s ToS requires its associates to display accurate, up-to-date prices, Twitter has only recently realized the implications of searching deleted tweets and doesn’t yet oblige its API partners to update their copies when tweets are deleted or protected.

Buying Back Your Own Data? Priceless.

If PII is so hard to protect, then the only way for social networks to protect their users’ privacy must be to prohibit partners from accessing contact information in the first place. I might not be able to export my holiday card mailing list from my favorite social network— a roach motel for our data — but giant marketing corporations can buy and sell our private information with impunity.

I could go to Rapleaf right now to buy an analysis of any list of email addresses to learn its makeup by gender, income, residence, and all manner of other demographic data. Who’s to say how short that list could be—it’s a slippery slope from aggregate info to personal info. Or I could shop at one of Intelius’ many fronts and affiliates who are selling PII explicitly (TRUSTe-certified!). Or I could barter some of the stray business cards on my desk on Jigsaw to fill in the rest of the puzzle. All of these businesses depend on PII data harvested from social networks.

How is that possible? None of the social networks that we’ve integrated with has an API for reading email addresses — but all of them have no problem asking you to “Invite your friends!”  After all, most social networks remain hypocritical enough to phish passwords to other social networks themselves as soon as they ask you to “Invite your friends” for their own viral growth!

Putting aside the hypocrisy of phishing passwords to scrape those friends’ email addresses in the first place, the subtler flaw is that social networks are more than happy to search their member database for those addresses to share a list of suggested friends. That’s how a Rapleaf could take a mailing list, pretend that those are all friends of theirs, and slowly accumulate a “reverse phonebook” that maps emails to social network profiles.

Or you could just crawl their websites. Social networks depend on search engines for traffic, so they almost universally have public pages for every member with well-known URLs and directory listings by name for crawlers to index. A mini-boomlet in funding “people search” startups underwrote this massive exercise, but they sold their archives to lessthansavory marketers.

Now, merely indexing public web pages can’t be evil—but reconciling online identities and 3rd-party advertising cookies with real-world credit reports, government records, and other databases can be. Adding in all that information doesn’t increase Mr. Smith’s anonymity; Jeff Jonas has made a small fortune proving that semantic reconciliation dramatically collapses uncertainty. Just think about combining Spock’s 100M profiles with Intelius’ 20B other data points; or Wink’s 200M profiles with Reunion MyLife’s 34M members and 700M records…

Whose Data Is It, Anyway?

The philosophical question at hand is what rights do I have in my friends’ information. When I accept a business card from someone I’ve just met, I don’t believe I have the right to re-sell it on Jigsaw in good conscience (they’d disagree 18M times). If it’s a colleague’s card, on the other hand, I might take the initiative to forward a new lead, or even buy a gift subscription to a magazine. Does that constitute a violation of their privacy, or spam?

Social networks haven’t let their users make their own decisions on this issue. Through selective enforcement of their policies, some startups get locked out while big partners get exemptions. ended up in (and out of) court. Plaxo found out the hard way that they couldn’t assist their paying customers to OCR Facebook email addresses; or to synchronize with LinkedIn. It says a lot about LinkedIn’s draconian ToS that even with paying customers demanding it, Comcast hasn’t signed up for their API. Even if users manually download their own LinkedIn address books, it won’t even include links back to folks’ public profile pages.

Don’t Accept Incompetence

I also claim that social networks are engaging in Privacy Theater because there’s no shortage of examples of organizations on the Web that process vast quantities of PII while providing real privacy protection. Do you think that the “bad guys” haven’t gone after Webmail services to phish passwords and harvest contact information? Aren’t e-commerce sites sharing product information and reviews out to legions of affiliates without leaking your purchase history? How long do you think RockYou would have gotten away with it if they were asking for your online banking username instead of your email address?

Social network sites have not (yet) demonstrated the high degree of proactive surveillance and enforcement characteristic of other organizations that deal with PII on the Internet. Users see worms on MySpace and viruses on Facebook, but not on Hotmail — because they defend against cross-site-scripting attacks. Users find malware distributed on Slide, but not on Wikipedia — because they filter content aggressively. Users are blocked by DDoS attacks and DNS attacks on Twitter — but Amazon stays up because they can react in real-time (mostly). How much more quickly do Cease & Desist letters for putting up a fake PayPal logo go out than for impersonating a Facebook Page?

From personal conversations, I’m beginning to wonder if the recent rise of Hadoop is part of the problem, surprisingly. Trying to detect patterns of abusive crawling and suspicious bursts of activity from partner apps by analyzing yesterday’s log files alerts you too late to react. The culture of many social networking websites seems to emphasize page load times (especially after the great Friendster meltdown), which isn’t quite the same as the enterprise IT, networking, and transactional database backgrounds of other leading Web architects. And unlike the formal (and informal) networks of security officials at online financial institutions to track distributed threats, I fear we have little evidence of coordinated responses to privacy threats that correlate identities across social networks.

I have first-hand experience that it takes more time (and more money) to ship applications that comply with social networks’ privacy policies. If we weren’t living with Privacy Theater, that might not have been a wasted investment. Inevitably, Gresham’s Law kicked in, and the good guys are being driven out by the bad guys (spammy apps, scammy apps, sneaky apps, conniving apps).

Privacy Theater: The Show Must Go On…

Naturally, I prefer to think of myself as one of the ‘good guys.’ I prefer to believe that privacy protection is a competitive advantage that users (citizens!) really value. Until this outrageous RockYou breach, I didn’t fully realize how irrelevant that is.

I’d argue that the hapless state of ToS enforcement by the major social network platforms only provides the feeling of improved privacy while doing little or nothing to actually improve privacy: that’s privacy theater.

Unfortunately, that analogy is still unfair: TSA may screen children at the airport, but at least their security theater doesn’t obscure the fact we haven’t had a catastrophic security failure in the US air transportation system (yet). Our major social networks’ privacy theater is distracting us from ongoing, large-scale identity theft and misuse of private and personally-identifiable information.

If the industry expects self-regulation to forestall government regulation, well, here’s what I think it would take: An immediate ban on all of RockYou’s applications by all of their partners, pending a public audit of all of their apps. That’s taking a page from the audit provisions of LinkedIn’s ToS and adding sunlight by publishing the results.

Sounds harsh? I thought the market was supposed to provide swifter, surer justice than some pesky regulator with its clunky old notions of due process and presumptions of innocence. API agreements are a private matter between ruthless corporations. Heck, if they really wanted to put the rest of the ecosystem on notice, they ought to audit every application funded by Sequoia, Partech, DCM, and Softbank, all lead investors in RockYou.

It’s not like lawsuits are being filed, as Marissa Mayer announced by going after work-from-home scam artists in an interview with Mike Arrington at LeWeb. It’s not like this is Scamville 2.0, since this isn’t stealing users’ cash, only their dignity. It’s not like there’s a legal spotlight on the issue, since there’s only $9M set aside for a hazy new privacy foundation in the latest Facebook class-action settlement. It’s not like it’s a political issue in the headlines, since a Facebook Chief Privacy Officer is running for Attorney General, the top law-enforcement office in California. It’s not like it’s as complicated as “don’t be evil,” since I can give you one simple tip to eliminate privacy theater: enforce your ToS and obey others’ ToS — or else stop setting unrealistic expectations and just let users have their data back!

(Photo credit: Flickr/FaceMePLS).