Facebook is improving the 360 video experience by predicting where you will look

From the stage of F8, Joaquin Quinonero, Facebook’s Director of Applied Machine Learning, described a new technique the company is using to improve the watching experience for 360 videos. The format is challenging to deliver because of its size, but Facebook is using machine learning to reduce the number of pixels that have to be rendered at any one time. By predicting where a viewer will look next, rendering priority can be given to that location — particularly helpful for users with lower quality internet access.

The status quo for 360 videos is reactive rather than proactive rendering. Mike Coward, engineering director for Facebook’s VR video team echoed the frustration of users to me when he described the unpleasantness of turning your head in VR only to see a blurry scene.

One partial fix is to optimize compression. But teams at the company are already using machine learning to select across the thousand-plus compression techniques for individual snippets of video. The other way to reduce the streaming load is to just cut down on what you’re rendering. And rather than reduce quality across the board, Facebook’s approach improves resolution for exactly what you’re most likely to look at next.

Mike Coward, engineering director for Facebook’s VR video team

Step one was to use the resources of the company to monitor where people actually do look when watching 360 videos. Facebook’s VR video team created a heat-map that highlighted the most popular spots that users looked at within videos. From there, Facebook built a generative saliency map using a deep neural network. This model makes it possible to perform predictions on new videos that haven’t previously been watched or studied.

If a human were to be given the task of predicting where someone might look, they might study their natural environment and look for anomalies that could catch one’s interest — think birds or a car driving by.

Abstracting away to the neural net, the physical cars and birds cease to matter. Facebook’s model was trained on a massive corpus of videos to identify interesting subsets of a video frame. Coward told me that the model, when faced with a surfer in the ocean, is capable of picking selecting the surfer as most interesting, despite the fact that both are fast moving entities.

After implementing the prediction model, Facebook was able to increase resolution by 39 percent on VR devices. Aside from improving resolution and making 360 videos accessible to people without great network connections, the technology could some day make it possible to offer preemptive suggestions to creators on how to make videos more engaging.