Google updates its Cloud Speech API with support for more languages, word-level timestamps

Google’s Cloud Speech API, which has allowed developers to use Google’s services to transcribe spoken words into text since its launch in 2016, is getting a major update today.

The most interesting of these is probably the addition of support for 30 new languages on top of the 89 languages the service already understood (though, to be fair, Google includes multiple regional variants of English, Spanish and Arabic in its total count). These new languages include the likes of Bengali, Latvian and Swahili and, according to Google, cover about a billion speakers.

On top of this, Google also introduced a few new core features to the service. Among these is support for word-level timestamps. The idea here is to tag every word with its timestamp so that developers can, for example, easily allow their users to hear what a given word sounded like. That’s especially interesting for human-augmented transcription and translation services that use this API to speed up their workflows. “Having the ability to map the audio to the text with timestamps significantly reduces the time spent proofreading transcripts,” says Happy Scribe co-founder, André Bastie, whose company uses the service for its $0.10/minute interview transcription service

The kind of files that developers upload to the service can now also be up to 3 hours long — up from 80 minutes in the previous version. Developers can also ask for a quota extension to upload files that are even longer.

Like before, developers can get 60 minutes of free audio processing through the Speech API and every additional 15 seconds is billed at $0.006.