MIT develops a speech recognition chip that uses a fraction of the power of existing technologies

MIT announced today that it’s developed a speech recognition chip capable of real world power savings of between 90 and 99 percent over existing technologies. Voice technology has, of course, become nearly ubiquitous in mobile devices, thanks to the exponential growth of smart assistants like Siri, Alexa and Google Home – but the new chip could help branch out in much simpler electronics.

The team gives IoT devices a potential use case – devices designed to go months on end without charging or changing batteries. Here’s MIT professor Anantha Chandrakasan on the chip,

Speech input will become a natural interface for many wearable applications and intelligent devices. The miniaturization of these devices will require a different interface than touch or keyboard. It will be critical to embed the speech functionality locally to save system energy consumption compared to performing this operation in the cloud.

The technology features a “voice activity detection” circuit capable of distinguishing ambient noise from speech, turning on on-board speech recognition hardware when it detects the latter.

Michael Price, a graduate student who worked on the project, gave TechCrunch a bit more detail regarding the system’s built-in speech detection,

The chip that we demonstrated includes a continuous speech recognizer based on hidden Markov Models (HMMs).  It transcribes an arbitrary length audio input into a sentence.  The transition model is a weighted finite-state transducer (WFST).  The acoustic model is a feed-forward neural network.  The same general techniques are used in some software speech recognizers.

We trained models for this recognizer using Kaldi, an open source toolkit.  We used a few different speech datasets for training and testing.  The largest recognizer we tested had a vocabulary of 145k words and required 7.78 mW for real-time operation.  The smallest was a digit recognizer (11 words including “oh” for zero) which required 172 uW.

The chip is essentially designed to be always on in a low-power mode, switching over when voice is detected, thus making it ideal for technologies like wearable devices, which can benefit from speech control, but are required to last much longer on a single charge than a standard handset.