Hacking The Facebook Platform For Data Portability

The following guest post was written by Dan Birdwhistell, founder of people directory Bigsight (reviewed here) and creator of Hacking Facebook, a website that teaches developers how to pull user data out of Facebook.


There’s one thing about Facebook that most people still seem to have wrong: that it’s a walled garden. Quite the contrary, the Platform allows for full data portability and has since its inception. It actually isn’t a walled garden at all.

The problem is that this knowledge is buried deep within the FB documentation, a place few developers have wandered. For whatever strange reason, legal documents are like amusement parks for me, so I’m now fairly well acquainted with the ins and outs of porting data (and users) out of FB. So that’s what this whole post is about: To show you how it’s done.

Background

Once we got our heads around the Platform back in October, 2007, we hacked together FriendCSV as a demonstration. This is an app that allows you to export your full social graph (and all friend data) to your hard drive. This is all done in accordance with FB policies. After people got comfortable with this, we took it a step further by allowing users instantly port their own personal data into bigsight to create a new profile and account. Test out our importer here.

Why Facebook and the Platform are important

We believe FB is architecting the next version of the web. This is a bold claim – no doubt — but here’s the thinking:

  1. FB has the users: 80mm and growing, with huge international membership and no age bias.
  2. Users enter their real information: Users enter their real name and affiliations. This moves the web away from (and makes users comfortable with abandoning) aliases.
  3. Users express themselves by connecting to entities that are “outside”: Users articulate their identity by claiming lasting elements like cities, companies, schools, and groups (or pages) that exist outside of FB.
  4. These entities are increasingly moving “in”: These groups are connecting to the same users and establishing broad footprints through ads, Pages, and Applications.
  5. The Platform and FB Connect are building the “between”: All the nice-happy-fun going on between Users and entities inside FB will start to extend back out into the web as developers learn how to build data/interaction bridges with the Platform and Connect.

The result is a web based on users and not content, with an individual’s FB ID ultimately serving as his chief tour guide, passport, and keymaster (but not like Vinz Clortho) around the rest of the web. So if I am right, FB will become king – not as a social network, but as the architect, owner, and manager of the next version of the web. So the point: you need to know how FB works and how you can leverage the Platform to grow your site or business. So here we go…

Understanding how FB Data is structured

Before you go messing around in the pool house, you’ll need to get your head around how everything is structured. It’s best to first focus entirely on non-user data given that these are the permanent structures users “claim”. Each of these elements has a unique ID and entry fields are typically auto-complete to ensure data alignment.

  • Location: There are ~540 regional networks and ~24,000 city/state/country listings. Cities in the US are expressed as “City, State abv.” while cities in other countries are expressed as “City, Country Name”. Regional networks outside of the US, Canada, and the UK are typically expressed just as a country. Users claim locations through networks, current city, hometown, work cities, groups, pages, events, and photo albums.
  • High Schools: There are ~23,000 worldwide high schools in FB. Users can enter up to two high schools, with graduation year for one of them. High school name and year is expressed on the profile.
  • Colleges and Universities: FB recognizes ~5,000 institutions. To streamline search during data entry, FB allows for multiple aliases for the same school. For instance, a user can search/find/select “UCLA” or “University of California, Los Angeles”. Whichever one is selected displays on the profile, though both are linked to the same ID. This makes data integration a bit dicey, but there’s a fix we’ll get to later. Users can enter up to five schools and can ascribe graduation year, type, concentration, and degree type (if it is a grad school).
  • Companies: You’ll find ~25,000 different companies. FB allows for multiple aliases during search, but it filters them out to the same display name across all profiles. We’re clueless as to why they did this for companies but not schools. Users can enter up to 15 jobs and can ascribe position, description, location, and duration.

So exactly how much data can you export?

Stated simply, you can touch basically everything but a user’s contact information. So here’s the list, including how the data is structured in its output. We’ll address friend lists and data in a moment.

Data Element Export Format
UID Permanent
First name Free form (ff)
Last name ff
About me: ff
Activities: ff
Birthday Day, Month, Year (1900-2008)
Books ff
Colleges Up to five: name, type, degree, concentration, grad year
Hometown “City, State” or “City, Country” if outside the US
High school Up to two: name, grad year
Interests ff
“interest sex” Male or female
“interest meeting” Friendship, Dating, Relationship, or Networking
Location “City, State” or “City, Country” if outside the US
Movies ff
Music ff
# of notes #
# of wall posts #
Networks (up to four) Region, High School, College, Work
Photo albums All pictures + tags, titles, etc.
Pictures Misc. pictures + tags, etc.
Political Affiliation: Party name
Profile pictures: 50×50, 50×150, 100×300, or 200×600
Profile update time: Date, time
Quotes: ff
Relationship Status: Single, in a relationship, engaged, married, it’s complicated, open relationship
Sex: Male or female
ID of Significant Other: UID
Status message: ff + date/time
Timezone: # offset from GMT: “-6” for Nashville, for instance
TV shows ff
Work History: Up to 15 companies: name, position, description, location, duration

In addition to these core profile elements, you can also make calls for and then export huge amounts of data through:

  • Events: Title, location, date (duration), picture, type, members, etc.
  • Pages: Name, type, location, hours, members, etc.
  • Groups: Name, type, description, location, members, etc.

Now about friend lists: As you’ll see when you use FriendCSV, you can not only access all of the above for a single user, but you can also access the same data from their friends. Pretty crazy, right? This means that by touching one user you can instantly touch thousands more. But hold on now…time to talk Privacy.

Understanding FB Privacy, Terms of Service, and Platform Documentation

There are five key documents that come into play re: data portability on FB. Taken alone, each is hard enough to understand – taken together, it’s downright labyrinthine. As a developer, though, there are really only four things you need to know:

  • The Onus of Privacy is on the User: While FB puts restrictions on how you can access and store information, they ultimately put the onus on the user when he interacts with an application. This means that users interact with apps at their own risk. From the Privacy Policy:

    “If you, your friends, or members of your network use any third-party applications developed using the Facebook Platform, those Platform Applications may access and share certain information about you with others in accordance with your privacy settings…

    …in addition, third party developers…may also have access to your personal information (excluding your contact information) if you permit Platform Applications to access your data.”

  • The 24-hour Clause: Most of you have heard of this. It basically states that you can suck out any data, but you can’t store it for more than 24 hours; however, there are two key things that people overlook: 1) There are some elements that can be stored indefinitely and 2) if there is a disclaimer on the application, the developer can do almost anything with the data.
  • The “Storable Indefinitely” Properties: FB allows us to store User ID, Network ID, Event ID, Group ID, and Photo ID.
  • The Gold in the Mountain — “Full Disclosure Opt-Ins”: As a clear extension of FB putting the onus on the user, they have included a clause in their documentation that says that developers can do almost anything with the data they touch if they have full disclosure. Taken from 2.A.6 of the TOS:

    “You may retain copies of Exportable Facebook Properties for such period of time (if any) as the Applicable Facebook User for such Exportable Facebook Properties may approve, if (and only if) such Applicable Facebook user expressly approves your doing so pursuant to an affirmative “opt-in” after receiving a prominent disclosure of a) the uses you intend to make of such Exportable Facebook Properties, b) the duration for which you will retain copies of such Exportable Facebook Properties, and c) any terms and conditions governing your use of such Exportable Facebook Properties (a “Full Disclosure Opt-In”).”

    This is a bit wordy, so we’ll translate: If you outline which data you’ll use, how you’ll use it, for how long, what other terms the User might be subject to, and get User consent, then you can keep and use profile information for as long as you want.

So the main lesson here is that you shouldn’t be afraid of the various policies and documents because they are outlined to help you rather than restrict you. But again… a note about friends’ data. FB has been incredibly aggressive in policing how developers are accessing and using these data, and rightfully so. Last week they shut down the Top Friends app for allowing too much data access and earlier this year they canned Google Facebook Connect because it didn’t operate in accordance with their policies.

I’ll say again that they were right to do this and when thinking through how to port users, you should be mindful not just that FB might shut you down, but that a secondary friend who doesn’t opt-in to your site probably should be left alone. More than likely, he doesn’t want what you’re selling. Of course, there are ways around this if you want to brute force it, but we’ll just keep that to ourselves. So let’s keep going…

Setting up the Application(s) and managing the exports

Your importer can be inside FB as part of an application or it can exist as a standalone. We do it both ways. With FriendCSV, users install the app and we then direct them to their new profile as an add-on; meanwhile, out in the ether, we have a dedicated portal at http://fb.bigsight.org that directs users to FB for initial authentication, but then kicks them right back to our web app. If you already own a great app with lots of traffic, start there. If not, it’s probably best to set up your porter out on the web. Exporting the key data for a single user doesn’t take too long, so you can typically create a new page/account for them instantly. However, if you plan on exporting an element like friends lists (careful, hoss) or photos, you’ll need to batch up FQL requests when possible and also be open to allowing some processes to happen in the background.

The FB API is “REST-like,” which means it can be used by anything that handles standard HTTP requests. Libraries exist for PHP, Java, Ruby, and other languages that make the API easier to use. The following example code is for Ruby on Rails and the Facebooker library, as that’s what we use at bigsight. No matter which language you choose, writing FB applications to extract data is surprisingly easy. One line of code will tell your application to authenticate with FB. Simply add “ensure_authenticated_to_facebook” to your Rails controller and it will send your user to the FB login page if needed, and return them to your application. From that point on you have full access to the FB user and all exportable data. Here’s one example of how to extract educational history:

def gather_schools
# Create a local copy of the Facebook user
@user = User.create(:name => @fb_user.name, :fb_uid => @fb_user.uid)
# Load the user's schools
for fb_school in @fb_user.education_history
School.create(:name => fb_school.name, :user_id => @user.id)
end
end

For a full view of the FQL queries, check out this page in the documentation.

Integrating FB Data into an Existing Third Party Site

Ok so now you know what the data look like and how to access it, you need to think through a few things to figure out how to integrate it all with your site or widget. These are the questions to ask:

  • What are the basic data elements you need for a user to interact with your site? Start by isolating the variables you need to a) successfully port a user to your site and b) give them enough active features that they instantly get a taste for your offering. Design your integration so that it is as simple (though complete) as possible. You might also consider including an “instant remove” link so that a user can quickly exit and take back his data.
  • What deep database elements do you need to align? This might take a bit of work depending on what types of information you need. For instance, we suck out and integrate city, company, and school data. This sounds easy enough, but it gets dicey: There are quite often many names for the same entity. So if you want to align these elements, you need to: a) figure out what FB calls them and then b) use that naming system or make it line up with yours so that your importer can identify multiple aliases.
  • How can you enrich user data in a novel way? There’s tons of win to be had if you can figure out a way to enrich a user’s data. We do this in two ways on bigsight:
    1. We match their school data against our own database and add the school logo to their profile pages. Furthermore, our school links go to pages that instantly show them people they may know. Here’s my alma mater, for instance: http://bigsight.org/school/wlu
    2. We built an algorithm that constructs full biographies based on a user’s profile data. This is fully dynamic and can have up to 140 different combinations depending on which school, company, and city data the user has and how he has structured it.

    Basically, get creative. It’s almost silly how many cool things can be done here.

  • Is there any way to leverage group, page, or event data? Check this out: http://bigsight.org/city/nashville_tn/events. This is a display of the events that I RSVP’d to in Nashville over the past year. Sucking out this data is fully legit. It doesn’t take long to realize how entirely new sites can now be built based on even one or two User imports.
  • How can you set up a User account? You might have to get creative when it comes to getting information (namely email) that isn’t directly available, though often needed to set up a working account. We ask for a user’s email up front and assign them a temporary login and pw based on this.
  • Are you going to store their raw data output? We highly recommend your discarding their original raw data, even if you have a full disclosure. It’s just better for everyone involved and is better for the user and the web. Remember that you can keep the User ID and if you codify the information in some way, you’re in the clear.

Conclusion

Like I said above, we believe that FB is on the path to doing something amazing with the web, and we believe that everyone in the industry needs to know how to not just adapt to it, but also thrive from (and alongside) it. It should be an interesting summer re: the web as Facebook Connect launches and more and more people begin leveraging this and the Platform for utility rather than blind user engagement.

Our opinion is that while FB Connect will offer some amazing functionality in regards to quick user integration and synching, it likely won’t be as powerful as the Platform in terms of data access. Either way, these developments will not only change how users interact with third party sites, but they will also raise the bar for user experience as individuals accustomed to the FB UI will begin to demand increased alignment. Soon we’ll likely see businesses start to build sites on the back of FB rather than a) going out on their own or b) doing what could prove to be complicated integration. Additionally, we’ll probably also find resolutions to a few ongoing discussions and questions such as who owns a friends’ list and how what FB calls “dynamic privacy” actually works out in the wild.

It’s all pretty interesting stuff to think through and incredibly fun to see it all come together so quickly. Creative destruction all around, you know. Lots of warriors in the arena. ARE YOU NOT ENTERTAINED?