It’s clear that voice is becoming a major interface, as we witness the rise of the Amazon Echo, Google Home, Siri, Cortana and their ilk. We’re also seeing an increasing use of chat bots and other voice-driven tools, which often require speech recognition with a very specific vocabulary.
That’s where AssemblyAI, a member of the Summer ’17 Y Combinator class comes in. The startup is building an API that will help developers build customized chat interfaces quickly.
“We’re building an API for customized speech recognition. Developers use our API for transcribing phone calls or creating custom voice interfaces. We help them recognize an unlimited number of custom words without any training,” Dylan Fox, AssemblyAI’s founder told TechCrunch.
He says, most off-the-shelf speech recognition APIs are designed to be one size fits all. If you want to customize it, it gets really expensive. AssemblyAI hopes to change that.
When Fox was working at his previous job as an engineer at Cisco, he saw first-hand how difficult it was to create a speech recognition program with custom words. It usually involved a lot of engineering resources and took a long time. He came up with the idea of AssemblyAI as a way to make it easier, less costly and much faster. He added, that recent advancements in AI and machine learning have made it possible to do what his company is doing now.
It’s worth noting that the tool requires GPUs, rather than CPUs, for increased processing power because the task is so resource-intensive. Getting access to a sufficient number of GPUs to build and run the tasks has been a challenge for the three-person startup, but their affiliation with Y Combinator has helped in that regard. It’s also brand new tech, so they have to solve every problem they encounter on their own. There are no books to read or solutions to look up on Google.
Even though they are just three people, they believe user experience is going to be key to their success, so they have one team member fully devoted to developing the front end. They claim that no training is required to run the API. You just upload a list of terms or names and the API takes care of the rest.
Fox fully recognizes that it’s hard for startup to build a speech recognition tool without constantly worrying about the bigger companies swooping in and grabbing their market share, but he says his company is working hard to differentiate itself as a go-to tool for developers.
“As a smaller company focused on a speech recognition technology, we can provide a better experience [than the bigger companies].” He says that means paying attention to the little things that attract developers to a tool like better documentation, simpler integration and just making it easier to use overall.
So far the product is in private beta with several companies deploying it on GPUs in the cloud, but it’s early days. He says when the customers come, they will have to scale to meet those demands using additional cloud-based GPU resources. If it works as described, that shouldn’t be long now.