Microsoft announced today that its conversational speech recognition system has reached a 5.1% error rate, its lowest so far. This surpasses the 5.9% error rate reached last year by a group of researchers from Microsoft Artificial Intelligence and Research and puts its accuracy on par with professional human transcribers who have advantages like the ability to listen to text several times.
Both studies transcribed recordings from the Switchboard corpus, a collection of about 2,400 telephone conversations that have been used by researchers to test speech recognition systems since the early 1990s. The new study was performed by a group of researchers at Microsoft AI and Research with the goal of achieving the same level of accuracy as a group of human transcribers who were able to listen to what they were transcribing several times, access its conversational context and work with other transcribers.
Overall, researchers from the latest study reduced the error rate by about 12 percent compared to last year’s findings by improving the neural net-based acoustic and language models of Microsoft’s speech recognition system. Notably, they also enabled its speech recognizer to use entire conversations, which let it adapt its transcriptions to context and predict what words or phrases were likely to come next, the way humans do when talking to one another.
Microsoft’s speech recognition system is used in services like Cortana, Presentation Translator and Microsoft Cognitive Services.