This week, a handful of academic researcher teams will gain access to a new tool from Facebook designed to aggregate near-universal real-time data on the world’s biggest social network.
When it comes to who gets access to Facebook data and how, the company now known as Meta is still feeling reverberations from 2018’s Cambridge Analytica scandal, in which a political consulting firm harvested the personal data of millions of unaware Facebook users to build detailed profiles on potential voters. The company shut down thousands of APIs in the three years that followed and is only now beginning to restore broad access for academic research.
TechCrunch previewed Facebook’s new academic research API and spoke with Facebook Product Manager Kiran Jagadeesh, who spearheaded the project with the Facebook Open Research & Transparency (FORT) team.
“This is just the beginning,” Jagadeesh told TechCrunch, characterizing the Researcher API as a beta version of the toolkit it eventually hopes to offer. The API, first announced at F8 this year, is Python-based and runs in JupyterLab, an open source notebook interface
In light of Facebook’s many past privacy woes, the new Researcher API comes with some initial caveats. First, the API will only be made available to a small group of established academic researchers through an invite-only system. The company plans to expand access beyond the initial test group in February 2022, incorporating feedback from the trial into a broader launch to all academics.
Another precaution: The Researcher API runs in a very controlled environment that Jagadeesh described as a “digital clean room.” Academic researchers with access to the API can enter the environment through a Facebook VPN, collect data and crunch numbers, but the raw data can’t be exported — only the analysis.
The idea is to protect user privacy and to prevent any data analyzed from being re-identified, but the limitation might rub some of the company’s critics the wrong way considering that all of the public data the Researcher API gathers is already out there floating around but difficult to aggregate and analyze with Facebook’s existing tools.
At launch the API will provide access to four buckets of real-time Facebook data: pages, groups, events and posts. In each case, the tool will only pull from public data and only from sources within the U.S. and the EU initially. For groups, events and pages, at least one administrator will need to be located in a supported country for that data to be made available through the API.
Through the tool, researchers can analyze large swaths of raw text using methodologies like sentiment analysis, which tracks the valence and emotions people express through their speech on a given topic. Beyond the text-based posts that comprise most of the available data, researchers can also access related information like group and page descriptions, their creation dates as well as post reactions.
Multimedia data like raw images won’t be included nor will comments or user demographic data (age, gender etc.). The API also won’t collect any data from Instagram, though Jagadeesh recognizes that the platform is very valuable for researchers and the team is exploring ways to make Instagram data available.
The FORT team is hoping to work closely with academic researchers to develop and build out the current tools, which Jagadeesh describes as a work in progress. While Meta indicated that its initial set of academic partners isn’t yet nailed down, the company has invited researchers from 23 academic institutions around the globe to kick the tires.
Researchers who have completed the team’s onboarding process and agreed to its privacy policies were granted access on Monday, November 15. Facebook requires anyone accessing the research to agree to privacy constraints, including not re-identifying specific individuals within the data.
The research API is only available to a handful of academic institutions for now, but the FORT team plans to explore granting access to other groups, including journalists. The goal is to create a public roadmap that gives researchers and journalists a transparent look at what the team is working toward.
The company has plenty of trust-building to do in the research community. In August, Facebook cut off access to advertising data for two prominent researchers affiliated with NYU’s Cybersecurity for Democracy project, prompting a rebuke from many academics and regulators. Those researchers focused on tracking misinformation and political ads through an opt-in browser tool called Ad Observer. In September, Facebook apologized to an elite group of researchers known as Social Science One for providing them with incomplete data — a mistake that undermined months of work and analysis.