In a sign that computers will be able to perform image analysis as fluently as text analysis, a group of Stanford-based researchers were able to make accurate predictions about neighborhood voting patterns based on millions of pictures collected from Google Street View, reports The New York Times. While other academic projects have used artificial intelligence to mine Google Street View for socioeconomic insights (such as Streetchange), this project is notable because of the vast quantity of images that its AI software processed.
Led by Stanford computer vision scientist Timnit Gebru, the team of researchers used software to analyze 50 million images of street scenes and location data. Their goal was to find data that could be used to predict demographic statistics at the zip code and precinct (which usually contain about 1,000 people) level.
From those images, they were able to glean information, including make and model, about 22 million cars, or 8% of all cars in the country, in 3,000 zip codes and 39,000 voting districts. After cross-referencing that data with information from other sources, including the Census Bureau’s American Community Survey and presidential election voting records, the researchers found that they were able to make accurate predictions about a neighborhood’s income, race, education and voting patterns.
In order to get their AI algorithms to classify cars accurately, the researchers trained it by recruiting hundreds of people from places like Mechanical Turk, as well as car experts, to identify vehicles in a sample of millions of pictures. In the end, their software was able to classify cars in 50 million images in just two weeks, a task the Times said would have taken a human expert 15 years to finish.
In an article published in the Proceedings of the National Academy of Sciences, the team wrote that their technology can supplement the American Community Survey, which costs more than $250 million each year to perform. Since the survey is also labor-intensive, with workers going door to door, that means smaller areas with populations of less than 65,000 are often overlooked. As technology improves, demographic statistics may eventually be updated in real time, though the researchers noted that policymakers will need to be careful to make sure data is collected only at the community level to safeguard individual privacy.