Official census forms are an invaluable source for demographic data throughout the country, but for trends that occur on the scale of weeks or months rather than years, they’re a bit lacking. But a new study shows that similar data used by Facebook to target ads could help fill in that blind spot.
Sociologist Emilio Zagheni at the University of Washington looked into the possibility, in this case specifically regarding migrants in the U.S. and their movements between states. He’s previously looked at this topic using Google+ and other internet-based metrics.
Say you wanted to know whether East African migrant populations were tending toward settling in cities, suburbs or rural areas. The census process is completed every decade, which really is too long a time to observe short-term trends that follow, for example, an economic recovery or important bill.
But by using, or rather strategically misusing Facebook’s Ads Manager tool, one can find reasonably accurate and up-to-date info on, for example, Somalian migrants to the Chicago metro area versus outside the city. Facebook has already extracted all this data — why not use it?
This data is, of course, not the whole picture. You’re not finding all Somalian folks in the Chicago area, only Facebook users who choose to accurately report their country of origin and current location. Compared with the data in the Census Bureau’s American Community Survey, it’s not very reliable. But it’s still valuable, Zagheni argues.
“Is it better to have a large sample that is biased, or a small sample that is nonbiased?” he asks in a UW news release. “The American Community Survey is a small sample that is more representative of the underlying population; Facebook is a very large sample but not representative. The idea is that in certain contexts, the sample in the American Community Survey is too small to say something significant. In other circumstances, Facebook samples are too biased.”
“With this project we aim at getting the best of both worlds,” he continues. “By calibrating the Facebook data with the American Community Survey, we can correct for the bias and get better estimates.”
With reliable but scarce ground truth data and noisy but voluminous supplementary data, you can put together a more precise picture than before — as long as you’re careful to control for those biases. Data from other social networks could also be brought in to even things out.
Zagheni and his team hope to refine the ideas demonstrated in the paper so that they can be applied in places like developing countries where self-reported data like Facebook’s is easy to come by but reliable government data isn’t. A “good enough” sketch of the population and recent trends could help with things like prioritizing infrastructure investment or directing aid.
It’s unfortunate that the whole thing required the researchers to abuse the advertising system to expose the data — surely Facebook can provide better access for research purposes. I asked the company whether that was a likely possibility. Zagheni seemed to like the idea.
“I certainly hope that there will be opportunities to work directly with Facebook on this line of research in the future,” he wrote in an email to TechCrunch.
The paper describing the team’s work is published in the latest issue of the journal Population and Development Review.