Computer voice-to-text technology has come a long way, and every time it gets better, new applications open up. It is still not 100 percent accurate. Hell, it’s not even 90 percent accurate. But it is accurate enough for automated voicemail transcription services to become increasingly available and good enough not to have to listen through 15 voicemails to get the gist of what they are about. Of course, voicemails are often translated incorrectly, sometimes to comic effect.
In a study comparing the accuracy of four different voice-to-text technologies (Google Voice, Preview in Microsoft Exchange, Ditech’s PhoneTag, and Yap) the one which came out on top was PhoneTag, which is now part of Ditech Networks. PhoneTag showed an 86 percent accuracy rate in translating 500 spoken messages into text. Google Voice was only able to achieve an 82 percent accuracy in its voice-to-text translations. The study only evaluated purely automated voice-to-text systems. Here’s how all four fared:
Automated Voice-to-Text Accuracy:
The study was commissioned by Ditech and carried out by William Meisel of TMA Associates. You can read his methodology in the document embedded below. Of course, a study commissioned by Google might show Google Voice coming out on top But what I find more interesting is that 86 percent accuracy is considered something to boast about. Ditech’s Chief Strategy Officer, Jamie Siminoff (who founded the company behind PhoneTag, Simulscribe) points out that each percentage point gain in accuracy is a big deal and that his goal is to get to 90 percent accuracy. To get beyond that, it si still necessary to use humans to clean up the automated translations.
PhoneTag offers both fully-automated and human-assisted transcription. One service which uses PhoneTag is Ribbit Mobile, which I’ve been using with the human-assisted transcription option turned on. I also use Google Voice on another phone. I’ve certainly noticed that the human-assisted transcriptions are incredibly accurate. It can even make sense of my three-year-old son’s messages:
Hi, daddy. Hello. We’re calling you from the kitchen. We just made, what we had just made, a banana (??). Bye. Bye.’
I turned off the human-assisted option and tested some purely automated transcriptions today, so I could compare it more fairly to Google Voice. Some messages were pretty much the same, for others the accuracy went way down, but I really couldn’t say that PhoneTag was noticeably better than Gogle Voice. But I do notice the difference when I have the human-assisted option turned on. So while 86 percent accuracy might be something to crow about, adding human translators to the mix is still by far the best way to go.
Google Voice is a free Internet service that uses VoIP technology to link phone numbers together. GrandCentral was relaunched as Google Voice on March 11, 2009 with new features, including voicemail transcriptions and SMS managing. Users of Google Voice are able to select a single U.S. phone number, from various area codes. When a Google Number is called, any or all of the user’s phones may be set to ring. Which phone(s) ring can be set based on...
Yap pioneered the world’s first high accuracy, automated speech recognition platform for “long duration” dialogues. Long duration dialogues are conversations and audio content ranging from 5 seconds to several hours. Yap specializes in accurately transcribing these dialogues for a variety of different scenarios including voicemail-to-text, conference call transcriptions, analysis of customer phone calls and voice-activated mobile messaging. The company was a TechCrunch finalist in 2007. Their management team previously worked on Apple’s iPod, Honda’s navigational systems, IBM’s ViaVoice, Microsoft’s Tellme...