Koemei Is Out To Transcribe All Video And Make It Searchable

Lord knows there is a lot of online video out there these days, but only a tiny proportion of it has been transcribed (less than 1% according to some estimates). Searching the mountains of video generated by businesses, governments and educational institutions for the valuable information within is almost impossible because the words hidden in the audio are invisible to search. Waiting for it is not just the world, but the many people who can’t access that video because of their disabilities. Transcription unlocks the gold-dust buried in them there video hills.

This would involve transcription on a vast scale, but this is exactly the problem Koemei aims to tackle. It’s a SAAS platform for speech recognition in video. Today at TechCrunch Disrupt it announced it has completed an integration with YouTube’s API in preparation for a potential launch. It also announced the successful completion of its first pilot with the University of Geneva and IMD Business School.

Simple video lectures can be uploaded, translated, linked to and visible on other platforms like YouTube. Users get to see an interface where they can go through the lecture and check the transcription.

Based out of Martigny, Switzerland and with offices in San Francisco, Koemei is a startup leveraging years of academic research. It was spun out of the Swiss Institute of Technology (Idiap institute), which worked with Sheffield University and Edinburgh University on a seven year EU-funded project (which has about $30 million spent on it already). Koemei acquired all the IP under a transfer agreement, has a patent pending and now plans to use its platform to transcribe video content on a super-scale.

The problem they are out to solve is obvious. Manual transcription is expensive (as much as $5 per minute). They claim to reduce the cost down to $0.09 a minute. The startup estimates the market for video transcription is around $16 billion annually, given that there are around 120,000 people doing this work in the U.S. alone. It anticipates there will be a 21% year on year growth in the business. The market for corporate and educational video is clearly the most lucrative here.

Koemei claims its automated transcription program works better than current systems from the likes of Nuance, because it not only transcribes the video’s soundtrack into words, but also produces an interface for humans to check the transcription. This can be open to the public or closed off for designated users. In other words, it ends up being like a crowdsourced effort to check an AI’s transcription, making it far more accurate than AI alone. An hour audio takes an hour to transcribe, claims the startup.

The transcriptions can be pushed to YouTube, Vimeo etc and you get the first 10 hours of transcription free, just in case you need convincing that it works. Of course the technology needn’t just work for the likes of YouTube. There is also videoconferencing, telepresence, web collaboration, group meetings, classroom lectures, webcast; the list goes on.

So far they’ve done a pilot rollout with some university partners which has brought in some revenues and proved the model. Next up will be more partners, plus an enterprise solution they want to offer to the likes of Vimeo, Brightcove, and Kaltura among others.

On the horizon, their potential competition is Nuance, Google (Google Voice) and solutions like Amazon Mechanical Turk. This is not exactly a weak opposition, but they reckon they can beat all comers. They claim Nuance has issues with long conversations; Google Voice is low quality and closed for other platforms; and mechanical turk solutions involving people are pricey – and may even be customers for Koemei in the end.

The startup predicts it could have $44.9 million in sales by 2014, with a potential exit to any number of players including, not unexpectedly, most of their opposition.

Backed with Angel funding, they’re now raising a $1.5 million Series A round, following a commitment from a European early-stage VC.

The team is led by Temitope Ola, formerly of Silicon Graphics, and comprises three others, most of whom worked on the platform during its academic development.