Google Research prototypes ambient audio contextual content

A team from Google Research has developed a prototype system that uses a home computer’s internal microphone to listen to the ambient audio in a room, determine what is being watched on TV and offer web-based supplemental information, services and shopping contextual to each program being watched. It’s strange, but it sounds like it works and people might really like it. There’s no indication yet whether or when this could be available as a service.

Google Research team members Michele Covell and Shumeet Baluja along with Michael Fink of the Hebrew University of Jerusalem’s Center for Neural Computation were given the best paper award for their report on the system at the the Euro ITV (interactive television) conference last week. (“Social- and Interactive-Television Applications Based on Real-Time Ambient-Audio Identification” 10 pg PDF, see also the Google Research blog post on the paper.)

The system compresses the captured audio into irreversible (emphasis theirs) summary statistics which are then compared to a database of mass media statistics and used to determine what the browser should display. Possible service offerings discussed in the paper fall into four categories:

  • Personalized information layers Here’s what Tom Cruise is wearing in the show you are watching and here’s where you can buy the same clothes in your zip code.
  • Ad hoc social peer communities If you would like to chat about this show, ten of your college friends are watching it right now as well.
  • Real-time popularity ratings Nielsen requires hardware and the results aren’t available in real-time. You might want to know if there is a spike in viewers watching the show on channel 9 right now. Advertisers might want to know that too.
  • TV- based bookmarks Click to save a show or clip into your video library and there will be more than just a few shows available for watching later.

The system requires no dedicated television-connected hardware, protects users’ privacy and is technically feasible, the researchers report. Experiments with a laptop placed in the lap of a person ten feet from a television and engaged in loud conversation with some one next to them were successful in providing matching online content – when channel surfing was taken into account.

Lest you fear that all broadcast TV is a huge data-set, the report says that ff the database of summaries holds only 32-bit descriptors of 5 second clips, then up to one year of broadcast information could be held in less than 1 GB. The researchers report that this is made much more feasible by re-runs.

Privacy concerns were addressed in the prototype by compressing captured audio on the user’s computer before transmitting summary data to the database for comparison and by offering a mute button in the program. Given Google’s recent ethical issues, these privacy measures may not be enough to assuage some people.