In the race to continue building more sophisticated AI deep learning models, Facebook has a secret weapon: billions of images on Instagram.
In research the company is presenting today at F8, Facebook details how it took what amounted to billions of public Instagram photos that had been annotated by users with hashtags and used that data to train their own image recognition models. They relied on hundreds of GPUs running around the clock to parse the data, but were ultimately left with deep learning models that beat industry benchmarks, the best of which achieved 85.4 percent accuracy on ImageNet.
If you’ve ever put a few hashtags onto an Instagram photo, you’ll know doing so isn’t exactly a research-grade process. There is generally some sort of method to why users tag an image with a specific hashtag; the challenge for Facebook was sorting what was relevant across billions of images.
When you’re operating at this scale — the largest of the tests used 3.5 billion Instagram images spanning 17,000 hashtags — even Facebook doesn’t have the resources to closely supervise the data. While other image recognition benchmarks may rely on millions of photos that human beings have pored through and annotated personally, Facebook had to find methods to clean up what users had submitted that they could do at scale.
The “pre-training” research focused on developing systems for finding relevant hashtags; that meant discovering which hashtags were synonymous while also learning to prioritize more specific hashtags over the more general ones. This ultimately led to what the research group called the “large-scale hashtag prediction model.”
The privacy implications here are interesting. On one hand, Facebook is only using what amounts to public data (no private accounts), but when a user posts an Instagram photo, how aware are they that they’re also contributing to a database that’s training deep learning models for a tech mega-corp? These are the questions of 2018, but they’re also issues that Facebook is undoubtedly growing more sensitive to out of self-preservation.
It’s worth noting that the product of these models was centered on the more object-focused image recognition. Facebook won’t be able to use this data to predict who your #mancrushmonday is and it also isn’t using the database to finally understand what makes a photo #lit. It can tell dog breeds, plants, food and plenty of other things that it’s grabbed from WordNet.
The accuracy from using this data isn’t necessarily the impressive part here. The increases in image recognition accuracy only were a couple of points in many of the tests, but what’s fascinating are the pre-training processes that turned noisy data that was this vast into something effective while being weakly trained. The models this data trained will be pretty universally useful to Facebook, but image recognition could also bring users better search and accessibility tools, as well as strengthening Facebook’s efforts to combat abuse on their platform.