Facebook’s New Colocation And Image Recognition Patents Tease The Future Of Sharing

Facebook’s empire was built on photo tags and sharing, but it’s a grueling process many neglect. Luckily, new Facebook patents give it tech to continuously capture video whenever your camera is open, rank and surface the best images, and auto-tag them with people, places, and businesses. They tease a future where pattern, facial, and audio recognition identify what you’re seeing for easy sharing.

The patents are for Automatic Photo Capture Based on Social Components and Identity Recognition (’80), Preferred images from captured video sequence (’00), and Image selection from captured video sequence based on social components (’65). The were filed for in October 2011 and granted over the last two months to Facebook and its employees Andrew “Boz” Bosworth, David Garcia, and Soleio Cuervo (who now works at Dropbox).

The patents cover some colocation technologies similar to that of failed startup Color, who came out of stealth in March 2011 a few months before Facebook filed for the patents. That may be no coincidence, and Color’s ideas for using every available sensor on a phone to tell who someone is with may have inspired Facebook to brainstorm in the space. Soon after Color emerged from stealth, I called on Facebook to develop its own colocation technology to help it forge an “implicit social graph” of who you spend time with. What it came up with could redefine the way we share.

Facebook Inventors

As we look at what the patents include, I’ll be referencing the last two digits of the patents number and their PDF page numbers so you can follow along.

Shooting Video While You Take Photos

The foundation of the patents is the idea that Facebook can capture video of everything you see in the viewfinder while you use the camera in its main smartphone apps or its standalone Camera app. It’s like an infinite “Zoe” video that some Android cameras take surrounding a photo:

“Although the camera function operates in a photo-capturing mode, the camera function may continuously capture video…Instead of the user capturing a photo by pressing a hardware (or software) button, [it] can automatically capture…one or more images relevant to the user from the real-time video being captured by the camera function {’80 p4}.”

“[the] user may capture a photo…Meanwhile, the camera function can continue to capture the real-time video {’65 p11}.”

Basically, Facebook could let you take traditional photos while dicing up continuously recorded video into still images. As camera lenses, storage capacity, and wireless connections improve, these images will increase in quality.

Knowing What You See

Media WheelWhat’s special is what Facebook could do with these videos and images. The patents describe the ability to scan the frames for important things like public figures via facial recognition, brands or products via image matching, and landmarks or businesses via pattern and text character recognition plus location:

“For example…a place (e.g., Eiffel Tower, Golden Gate Bridge, Yosemite National Park, Hollywood), a business or an organization (e.g., a coffee shop, San Francisco Giants), or a brand or product (e.g., Coca-Cola, Louis Vuitton)…The image selection process may tag one or more social networking objects identified in the selected frames to the stored video segment {’80 p5}.”

“An object recognition algorithm may use optical character recognition techniques to identify one or more characters (e.g., “HOLLYWOOD”, “San Francisco Giants”) in one or more frames and match against image data (or identity data such as names, logos)…[and] may use computer vision techniques to extract a set of features (e.g., edges, corners, ridges, blobs, curvatures, etc.) from an image. The object recognition algorithm may determine a match between two images by comparing respective sets of features {’00 p11}.”

You might not want to formally tag these things, but the tags would be pre-filled for easy sharing. Whether or not you display the tags, recognition of he presence of these objects and locations can tell Facebook what the most important frames of your video are, exactly where you are, and what types of businesses might want to reach you.

Colocating You And Your Friends

Facebook’s patents also give it exciting ways to figure out who you’re with. Instead of manually tagging friends, or even using facial recognition through the Face.com technology it acquired, Facebook could put to work all the sensors in the phones of you and those around you:

“The image selection process can access a GPS sensor…[detect] a user who has GPS coordinates within 100 feet from the first user’s current location…a user who is attending the same event, a user who has just checked in to the same location…data reports from mobile devices of other users that have interacted with the first user’s mobile phone via Bluetooth or Near-Field Communication…The audio recognition algorithm may determine a match between two audio files by comparing fingerprints of the two audio files {’80 p5}.”

That last capability, matching the audio recorded by the microphones of two users to identify that they’re right next to each other, was one of Color’s most exciting features.

Identifying What’s important

So Facebook knows who and what’s around around you. Then it wants to rank which frames of your video are the most interesting to you. To do that, it looks at your affinity to these objects (which friends, locations or brands you interact with most), and how popular they are to the public. It combines this data with extra audio and image cues of importance:

“The image selection process may analyze content of the voice segments (e.g., by using a speech recognition algorithm) for indication of importance (e.g., “Say cheese!”, “Cheese!”, “This is beautiful!”, “Amazing!”), and adjust a score of a frame…[and] may analyze picture quality of a frame (e.g., blurriness) by accessing a motion sensor (e.g., an accelerometer), and adjust a score of a frame less favorably if the frame corresponds to a time period of significant vibration or movement of the mobile device {’80 p6}.”

Honestly, the idea that an app could hear you say “This is beautiful” and know the photos and video you take at that time are important is sci-fi brilliant.

Choosing Your Best Moments

At this point, Facebook can make well-educated guesses about what you want to share. It imagines splaying them out in what it calls a “media wheel” and showing the best frames as thumbnails when you share to the news feed:

“The image capturing process can cause the camera function to display in its graphical user interface (201) selectable thumbnails corresponding to the one or more selected frames in a scrollable media wheel panel (220) adjacent to the view finder (230) {’80 p6}.”

“the preferred image selection process can cause news feed engine (110) to construct a news feed entry comprising thumbnails corresponding to the selected frames {‘oo p12}.”


Essentially, Facebook would rank all the frames of your video, show you the best ones, and you could select your favorites to represent your video when you share it. When people want to watch the video, they can click on one of the thumbnails, and instantly view the video starting 10 seconds before that frame. As quality of the still images ripped from video improve, Facebook could likely even allow you to share the frames as photos.

Assisted Sharing

These patents could redefine how we share. You wouldn’t need to search for people, locations, or things to tag. They’ll just be there waiting for your approval. That’s a big win on mobile where you want to share and get back to your life. Your content will contain so much structured data that Facebook could better route it to the people who’ll find it most interesting. And when we consume video, an opaque medium that’s classically tougher to skim than photos, there’ll be anchors and highlights pointing us to the most important moments. These could all encourage sharing and expose us to more enjoyable content. Facebook could even use colocation to create collaborative photo albums and carry on Color’s mission to let you “Take photos together”.

Color Take Photos TogetherThere’s also huge implications for Facebook’s business. Likes are a terribly inaccurate graph of what businesses and places you care about. Today, recommendations of what to Like are scattershot, and it’s a chore to key them in. By recognizing businesses and brands in photos and videos, it can add them to its treasure trove of information about your preferences. That will help it fill out Graph Search, and target you with increasingly accurate ads and ecommerce opportunities.

Imagine if Coca-Cola could target ads to people frequently seen with cans of Pepsi in their photos — high potential customers who specifically don’t like them yet. A restaurant could show push real-time ads to people shooting videos of a landmark next door. Facebook could open up ads APIs to surface characteristics like “currently with three or more friends” that a bar could take advantage of to advertise to nearby groups.

The war for ad dollars will be won with data. Facebook already sees over 300 million photos uploaded each day. These patents could be a diamond drill, allowing Facebook to mine much more business information out of every piece of user generated content it imports.

The world’s premier social network is 9 years old now. In some ways, that’s actually a disadvantage. People have forged connections with too many people and things they don’t actually care about. Facebook doesn’t know much about your offline life, or whether someone is a close friend or a distant acquaintance in that realm. That leads to a boring news feed — a huge danger to Facebook’s engagement-based business model.

Facebook has spent years trying to get people to put in work to explicitly prune and classify their relationships with Friend Lists and news feed filters, but most still don’t. Implicit colocation and business identification could bring new richness and detail to its social graph, so your meatspace experience enhances your world of ones and zeros.