ChaLearn Challenges You To Teach A Kinect Instant Gesture Recognition

There’s seemingly no end to the clever things that people can do with a little know-how and a Kinect camera, and now it seems like the machine learning enthusiasts at ChaLearn want to use the Xbox accessory to change the way computers deal with gesture controls.

In short, they’re challenging the world’s data tinkerers to develop a learning system that allows a Kinect to recognize physical gestures in one shot.

Why one shot? The way ChaLearn looks at it, if gesture-based control is ever going to become a staple of how we interact with our technology, there can’t be an overly-complex process to define those gestures. What ChaLearn wants teams to accomplish is a way for a gesture to be defined and subsequently recognized after it’s been performed only once. After all, if a human can do it, why shouldn’t a machine be able to?

The competition is being run using Kaggle, the data modelling competition platform that just wrapped up an $11 million funding round not long ago. The gesture-learning challenge is one of nearly 30 that Kaggle hosts, which run the gamut from asking users to determine if a car bought at an auction is a lemon to predicting which patients will be admitted to a hospital by parsing claims data.

As you can imagine, only the hardiest of data crunchers need apply. Competitors are given a Kinect’s RGB video and spatial depth data of a subject performing a series of gestures, and are tasked with finding a way to predict the identity of those gestures as defined in a separate “truth file.”

Here’s a brief snippet from the challenge’s description that should give you an idea of the sort of work involved:

For each video, you provide an ordered list of labels R corresponding to the recognized gestures. We compare this list to the corresponding list of labels T in the prescribed list of gestures that the user had to play. These are the “true” gesture labels (provided that the users did not make mistakes). We compute the so-called Levenshtein distance L(R, T), that is the minimum number of edit operations (substitution, insertion, or deletion) that one has to perform to go from R to T (or vice versa). The Levenhstein distance is also know as “edit distance”.

It’s going to be a lot of work even if you’ve boned up on your Levenhstein distances, but the winning teams will be handsomely rewarded. Thanks to the prominent use of Kinects in the challenge, Microsoft has thrown the competition their support in the form of $10,000 to be split among the top three teams. What’s more, if Microsoft is fond of your solution, they have the option of licensing your work in exchange for a payout as large as $100,000.

The solution development period starts now and runs though April 6, 2012, and the last chance to upload your learning solution comes 4 days after that. Better get cracking if you want to take home that prize (oh, and potentially change the course of human-computer interaction).