Hacking The Facebook Platform For Data Portability

The following guest post was written by Dan Birdwhistell, founder of people directory Bigsight (reviewed here) and creator of Hacking Facebook, a website that teaches developers how to pull user data out of Facebook.


There’s one thing about Facebook that most people still seem to have wrong: that it’s a walled garden. Quite the contrary, the Platform allows for full data portability and has since its inception. It actually isn’t a walled garden at all.

The problem is that this knowledge is buried deep within the FB documentation, a place few developers have wandered. For whatever strange reason, legal documents are like amusement parks for me, so I’m now fairly well acquainted with the ins and outs of porting data (and users) out of FB. So that’s what this whole post is about: To show you how it’s done.

Background

Once we got our heads around the Platform back in October, 2007, we hacked together FriendCSV as a demonstration. This is an app that allows you to export your full social graph (and all friend data) to your hard drive. This is all done in accordance with FB policies. After people got comfortable with this, we took it a step further by allowing users instantly port their own personal data into bigsight to create a new profile and account. Test out our importer here.

Why Facebook and the Platform are important

We believe FB is architecting the next version of the web. This is a bold claim – no doubt — but here’s the thinking:

  1. FB has the users: 80mm and growing, with huge international membership and no age bias.
  2. Users enter their real information: Users enter their real name and affiliations. This moves the web away from (and makes users comfortable with abandoning) aliases.
  3. Users express themselves by connecting to entities that are “outside”: Users articulate their identity by claiming lasting elements like cities, companies, schools, and groups (or pages) that exist outside of FB.
  4. These entities are increasingly moving “in”: These groups are connecting to the same users and establishing broad footprints through ads, Pages, and Applications.
  5. The Platform and FB Connect are building the “between”: All the nice-happy-fun going on between Users and entities inside FB will start to extend back out into the web as developers learn how to build data/interaction bridges with the Platform and Connect.

The result is a web based on users and not content, with an individual’s FB ID ultimately serving as his chief tour guide, passport, and keymaster (but not like Vinz Clortho) around the rest of the web. So if I am right, FB will become king – not as a social network, but as the architect, owner, and manager of the next version of the web. So the point: you need to know how FB works and how you can leverage the Platform to grow your site or business. So here we go…

Understanding how FB Data is structured

Before you go messing around in the pool house, you’ll need to get your head around how everything is structured. It’s best to first focus entirely on non-user data given that these are the permanent structures users “claim”. Each of these elements has a unique ID and entry fields are typically auto-complete to ensure data alignment.

So exactly how much data can you export?

Stated simply, you can touch basically everything but a user’s contact information. So here’s the list, including how the data is structured in its output. We’ll address friend lists and data in a moment.

Data Element Export Format
UID Permanent
First name Free form (ff)
Last name ff
About me: ff
Activities: ff
Birthday Day, Month, Year (1900-2008)
Books ff
Colleges Up to five: name, type, degree, concentration, grad year
Hometown “City, State” or “City, Country” if outside the US
High school Up to two: name, grad year
Interests ff
“interest sex” Male or female
“interest meeting” Friendship, Dating, Relationship, or Networking
Location “City, State” or “City, Country” if outside the US
Movies ff
Music ff
# of notes #
# of wall posts #
Networks (up to four) Region, High School, College, Work
Photo albums All pictures + tags, titles, etc.
Pictures Misc. pictures + tags, etc.
Political Affiliation: Party name
Profile pictures: 50×50, 50×150, 100×300, or 200×600
Profile update time: Date, time
Quotes: ff
Relationship Status: Single, in a relationship, engaged, married, it’s complicated, open relationship
Sex: Male or female
ID of Significant Other: UID
Status message: ff + date/time
Timezone: # offset from GMT: “-6” for Nashville, for instance
TV shows ff
Work History: Up to 15 companies: name, position, description, location, duration

In addition to these core profile elements, you can also make calls for and then export huge amounts of data through:

Now about friend lists: As you’ll see when you use FriendCSV, you can not only access all of the above for a single user, but you can also access the same data from their friends. Pretty crazy, right? This means that by touching one user you can instantly touch thousands more. But hold on now…time to talk Privacy.

Understanding FB Privacy, Terms of Service, and Platform Documentation

There are five key documents that come into play re: data portability on FB. Taken alone, each is hard enough to understand – taken together, it’s downright labyrinthine. As a developer, though, there are really only four things you need to know:

So the main lesson here is that you shouldn’t be afraid of the various policies and documents because they are outlined to help you rather than restrict you. But again… a note about friends’ data. FB has been incredibly aggressive in policing how developers are accessing and using these data, and rightfully so. Last week they shut down the Top Friends app for allowing too much data access and earlier this year they canned Google Facebook Connect because it didn’t operate in accordance with their policies.

I’ll say again that they were right to do this and when thinking through how to port users, you should be mindful not just that FB might shut you down, but that a secondary friend who doesn’t opt-in to your site probably should be left alone. More than likely, he doesn’t want what you’re selling. Of course, there are ways around this if you want to brute force it, but we’ll just keep that to ourselves. So let’s keep going…

Setting up the Application(s) and managing the exports

Your importer can be inside FB as part of an application or it can exist as a standalone. We do it both ways. With FriendCSV, users install the app and we then direct them to their new profile as an add-on; meanwhile, out in the ether, we have a dedicated portal at http://fb.bigsight.org that directs users to FB for initial authentication, but then kicks them right back to our web app. If you already own a great app with lots of traffic, start there. If not, it’s probably best to set up your porter out on the web. Exporting the key data for a single user doesn’t take too long, so you can typically create a new page/account for them instantly. However, if you plan on exporting an element like friends lists (careful, hoss) or photos, you’ll need to batch up FQL requests when possible and also be open to allowing some processes to happen in the background.

The FB API is “REST-like,” which means it can be used by anything that handles standard HTTP requests. Libraries exist for PHP, Java, Ruby, and other languages that make the API easier to use. The following example code is for Ruby on Rails and the Facebooker library, as that’s what we use at bigsight. No matter which language you choose, writing FB applications to extract data is surprisingly easy. One line of code will tell your application to authenticate with FB. Simply add “ensure_authenticated_to_facebook” to your Rails controller and it will send your user to the FB login page if needed, and return them to your application. From that point on you have full access to the FB user and all exportable data. Here’s one example of how to extract educational history:

def gather_schools
# Create a local copy of the Facebook user
@user = User.create(:name => @fb_user.name, :fb_uid => @fb_user.uid)
# Load the user's schools
for fb_school in @fb_user.education_history
School.create(:name => fb_school.name, :user_id => @user.id)
end
end

For a full view of the FQL queries, check out this page in the documentation.

Integrating FB Data into an Existing Third Party Site

Ok so now you know what the data look like and how to access it, you need to think through a few things to figure out how to integrate it all with your site or widget. These are the questions to ask:

Conclusion

Like I said above, we believe that FB is on the path to doing something amazing with the web, and we believe that everyone in the industry needs to know how to not just adapt to it, but also thrive from (and alongside) it. It should be an interesting summer re: the web as Facebook Connect launches and more and more people begin leveraging this and the Platform for utility rather than blind user engagement.

Our opinion is that while FB Connect will offer some amazing functionality in regards to quick user integration and synching, it likely won’t be as powerful as the Platform in terms of data access. Either way, these developments will not only change how users interact with third party sites, but they will also raise the bar for user experience as individuals accustomed to the FB UI will begin to demand increased alignment. Soon we’ll likely see businesses start to build sites on the back of FB rather than a) going out on their own or b) doing what could prove to be complicated integration. Additionally, we’ll probably also find resolutions to a few ongoing discussions and questions such as who owns a friends’ list and how what FB calls “dynamic privacy” actually works out in the wild.

It’s all pretty interesting stuff to think through and incredibly fun to see it all come together so quickly. Creative destruction all around, you know. Lots of warriors in the arena. ARE YOU NOT ENTERTAINED?

Latest Stories