To search these days is really an incredibly service-intensive process. Whereas before, to search something meant you had to through its drawers or folders by hand and inspect things by eye, now it means simply to produce a query and allow the vast computational engines of cloud services to exert themselves in parallel, sifting through petabytes of data and instantly presenting you with your results, ordered and arranged like snacks on a platter. We’re spoiled, to say the least.
It’s not enough, however, to have computers blindly compare 1s and 0s; when humans search, they search intelligently. We’ve seen incredible leaps in the ability to do this, and in the area of visual search, we’ve seen some interesting and practical technologies in (respectively) Photosynth and Google’s search by image function. And now some researchers at CMU have taken another step in the education of our tools. Their work, being presented at SIGGRAPH Asia, cleaves even closer to human visual cognition, though there’s still a long way to go on that front.
The challenge, when comparing images for similarity, is how to determine the parts of the image that make it unique. For us this is child’s play, literally: we learn the basics of visual distinction when we are toddlers, and have decades of practice. Computer vision, on the other hand, has no such biological library to draw on and must work algorithmically.
To this end, the researchers at Carnegie Mellon have determined an interesting way of comparing images. Instead of comparing a given image head to head with other images and trying to determine a degree of similarity, they turned the problem around. They compared the target image with a great number of random images and recorded the ways in which it differed the most from them. If another image differs in similar ways, chances are it’s similar to the first image. Ingenious, isn’t it?
The results speak for themselves: not only are they, like Google’s search tools, able to find images with similar shapes or, like Photosynth, able to find images of the same object or location with variations in color or angle, but they are able to reliably match very different versions of an image, like sketches, paintings, or images from totally different seasons or what have you.
Their video explains it pretty well:
[youtube http://www.youtube.com/watch?feature=player_embedded&v=PY__Fo4o67I w=640]
Essentially, it’s an image comparison tool that acts more like a human: identifying not the ways in which a scene is like other scenes, but how it is different from everything else in the world. It recognizes the dome of St. Peter’s whether it’s Summer or Winter, ball point pen or photo.
Naturally there are limitations. The process is not very efficient and is extremely CPU-intensive; while Google may have reasonably similar images returned to you in half a second, the CMU approach would take much longer due to the way it must sift through countless images and do complicated zone-based comparisons. But the results are much more accurate and reliable, it seems, and calculation time will only decrease.
What will happen next? The research will almost certainly continue, and as this is a hot space right now, I wouldn’t be surprised to see these guys snapped up by one of the majors (Google, Microsoft, Flickr) in a bid to outpace the others at visual search. Update: Google is in fact one of the funders of the project, though in what capacity and at what level is not disclosed.
The research team consists of Abhinav Shrivastava, Tomasz Malisiewicz, Abhinav Gupta, and Alexei A. Efros, who is leading the project. The full paper can be downloaded here (PDF) and there is some supplementary info and video at the project site if you’re interested.