The Challenges Of Tomorrow's Multimedia As Seen Through The Eyes Of Google, Yahoo, Nokia And Others

Part of this year’s ACM Multimedia conference, the Multimedia Grand Challenge 2009 aims to collect information on the specific problems and issues companies like Google, Yahoo, Nokia, HP, Radvision and CeWe see arise on the multimedia horizon for the next 2-5 years.

Researchers from around the world will be encouraged to submit working systems that significantly address the challenges defined by the companies cited above, in order to win the Grand Challenge competition (prizes to be defined). Deadline for submissions is June 15th.

The six companies have put forward eight challenges so far, which all make for interesting reads:

Yahoo!

– Robust Automatic Segmentation of Video According to Narrative Themes

The challenge to researchers in the multi-media community is to develop methods, techniques, and algorithms to automatically generate narrative themes for a given video, as well as present the content in an easy-to-consume manner to end-users in a search engine experience.

– Robust Clustering Guided by User Intent in Image Search

With the growing number of images on the Internet it is important to have the ability to organize and surface the images in the most efficient, meaningful way possible so that more images can be surfaced to searchers. The challenge to researchers in the multi-media community is to 1) develop a robust way of understanding user intent and 2) generate highly relevant clusters for the given intent and query.

Google

– Robust, As-Accurate-As-Human Genre Classification for Video

A notion of browsing collections is naturally associated with videos. Having videos classified into a pre-existing hierarchy of genres is one way to make the browsing task easier. The goal of this task would be to take user generated videos (along with their sparse and noisy metadata) and automatically classify them into genres.

Nokia

– Where was this Photo Taken, and How?

This challenge focuses on capture device location and orientation, one dimension of content metadata. The problem can be stated simply: try to derive exact camera poses (location and orientation) of given photos that are lacking location annotation. This kind of technology could potentially be used to add metadata to existing or newly captured photos.

– Robust Identification of Informative Multimedia Content in Web Pages

In recent years, there is research in web content analysis and extraction that attempts to tackle similar problem, but many emphasize the textual information instead of the associated multimedia data. Thus, this Grand Challenge invites solutions to the robust identification and extraction of informative multimedia content for any arbitrary web page authored in any language, not just English: Ideally, we would like to have a Grand Challenge solution that is over 99% accurate for any web page of any language.

Radvision

– Video Conferencing To Surpass “In-Person” Meeting Experience

The great challenge for Video conferencing vendors is to supply users with a meeting experience that equals or surpasses “in-person” meetings. It is assumed that when meeting experience will be good enough, or even better, the technology could potentially minimize the need for “physical” meetings (at least for business purposes).

– Real-time Data Collaboration Adaptation for Multi-Device Video Conferencing

With the video conferencing market moving out of the meeting rooms and into laptops, netbooks, mobile devices, etc., data collaboration becomes a big challenge. The data, usually sent in high, native PC resolution (such as XGA), has to be adapted to multiple devices, each with its own processing and screen capabilities. This challenge focuses on adapting, in real-time, the data collaboration channel to different receiving devices, in a way that would be regarded as optimal perceptually by users.

CeWe

– The Next Generation of Tangible Multimedia Products

The open issue is how to help the user determine a meaningful subset of photos out of a collection, which best summarizes and represents the specific event. This is still not satisfactory solved after years of research in multimedia analysis and retrieval.

These are really straightforward, fascinating challenges, and it will be interesting to see how researches respond to the issues put forward. There’s still room for other corporations to participate in the Multimedia Grand Challenge 2009, contact details are at the bottom of this page.

The top submissions will be presented in a special event during the MM 2009 conference in Beijing, China. Based on those presentations, winners will be selected for Grand Challenge awards.

We’ll be sure to write a follow-up post on the winners!