Video search is an unsolved problem. VideoSurf applies hardcore computer vision technology to this problem and finds relevant results beyond what may already be available to text-based search methods. In the demo at TechCrunch50, the startup showed how you might want to search for a scene in the show Entourage. You can drill down to the show, and then are presented with thumbnails of all of the characters, left to right at the top of the screen by level of importance. By clicking on a character, you get all the scenes in which that character appears, as well as related scenes.
A search starts with a keyword or normal guided navigation, but quickly becomes driven visually. Every time you click on a thumbnail, based on the visual cues, you are taken to exactly the right moment in the scene. Other thumbnails of related video snippets also appear. VideoSurf automatically adds tags and metadata to each and every frame of video. That is what is driving the search results. Once you click on the scene you want, you can watch it. Or, if you want to share a specific moment, you can set some time sliders and send exactly the scene you want off to your friends and family.
Robert Scoble, one of the TC50 judges, summed up his reaction (and that of much of the audience): “Finally, a Website that doesn’t suck.”
Bradley Horowitz: I have some experience in the space. I studied computer vision at MIT. If you are looking for a Mentos video, you don’t need a Mentos detector. People capture that. I didn’t see that social [element]. Are you taking a hypertechnical approach to solving video search?
CEO Lior Delgo: “People are not going to go and tag every frame in the video. That is what we are doing, going frame by frame and creating metadata that does not exist. You can ask everyone in the audience to tag each frame, you won’t be able to do that? This technology has never been done before. This is the first time it is working.”
Joi Ito: “When you say look into the video, how are you doing it, how accurate is it?”
CTO Dr. Eitan Sharon: “We go everywhere from detecting scenes, get the unique and important moments, also extract the people. So the same person is grouped together, and all the appearances will be presented together. It has face recognition.”
Ito: “Obviously, it could also be used for surveillance. Can I look for “Scoble sex,”? What will I find?”
Michael Arrington: “Zero results.”
Scoble: “How long does it take to process and index each video?”
Sharon: “Much faster than real time.”
Scoble: “Can this be done for streaming video?”
Sharon: “We can process it in real-time as well.”
Sheryl Sandberg. “It is based on being able to understand the data in the first place. The reason AdSense worked is because there is a deep understanding of the content. What exists for text-based content does not exist for anything else. If you think about actually being able to metatag video, this is exciting. I also think that starting with a consumer Website is the right start
Horowitz: “If you look at PageRank, we look at link flux. The context is important, where does this exist in the Web. You want to go beyond just drilling down into the data. I don’t see that here.”
Arrington: “It would be nice if a computer could tell you what’s inside a picture.”
Horowitz: “Did you see Picassa? We launched face recognition last week.”
Delgo: “The information we are able to collect is way more relevant.”