Keyvan Mohajer is hoping the past nine years of research and work building a digital assistant at SoundHound will be worth the wait.
Today, SoundHound CEO Mohajer said the company is publicly releasing its Hound app, a tool that lets users search for answers to questions and complete some tasks through voice commands. Some examples include searching for restaurant results on Yelp or summoning an Uber. The app is available for Android and the iPhone.
But the popularity of the application is going to be riding on whether conversational user interfaces catch on with the public. For the most part, the entire field still feels like it’s in beta — including all the tools coming from bigger companies. But the conversational UI entering prominence is essentially the result of larger tech companies that have thrown their weight against the tools, including Apple’s Siri, Google Now and the Amazon Echo.
“We had this vision in the year 2000,” Mohajer said. “We actually positioned the company to be the leader in that. We knew it would take many years. We were working on this well before Apple, well before Siri; we wanted to own all the core technologies. We wanted it to be a step change in quality.”
The questions can be simple — like “what time is it in Tokyo,” Mohajer said. But users can also dive into increasingly complex questions. In a demonstration of the app, Mohajer asked when the sun was going to rise a few days before Christmas, and it returned with a time answer for that. He also tried out a mortgage calculation by listing off various things like the principal for the loan. The point of the demo was to show that the questions can have as much depth as the user is looking for.
Users can also follow up with additional questions. In the demo, Mohajer asked whether the restaurant had outdoor seating, and it returned the correct answer: it did. The technology he attributed this to was called “speech to meaning,” a back-end tool that the company has been working on for the better part of a decade.
His explanation was that Apple and Google first translate voice to text, and then run that text against an engine that attempts to generate the meaning for the text. Hound brings the meaning translation component of it closer to the voice transcription, attempting to determine what the user is asking faster, he said. The result is that the response will hopefully be quicker and more accurate than a tool like Siri, Mohajer said.
Of course, there are giants in the space that can easily throw a lot of resources behind it. If Amazon decided it wanted to compete aggressively, it could throw a ton of cash and resources behind the problem and potentially leapfrog SoundHound. Mohajer’s hope is that the company has enough of a lead on Amazon and other companies.
To be sure, a lot of this can also be handled in Google Now and Siri — to do a quick test, I asked Google Now for the nearest sushi restaurants with a rating of above 4. I asked the same question of Siri, which also returned similar answers. But Hound’s goal is to rely on its team and other developers to come up with more use cases than those for which Siri and Google Now have exact answers. (I also asked the sunrise question Mohajer asked Hound earlier, which Google Now had an answer for.)
So there’s certainly a lot of work to be done. But there is already one big differentiator between Google Now, Siri and Hound: SoundHound’s virtual assistant seems, at first glance, much, much faster at recognizing and processing queries. Now, if it’s going to get ahead of other technologies and stay that way, it has to collect all the data it needs and build out its own knowledge graph similar to the way Google has.
SoundHound has the benefit of leaning on years of data it’s pulled in from its core application, which detects music and shows users what song is playing. The data has become powerful enough that users can simply hum songs into SoundHound — or the Hound app, now — and with reasonable accuracy SoundHound will show them the song.
Another goal is to get the service distributed across other applications under the banner of a set of developer tools called Houndify, Mohajer said. Hound is one example that showcases the technology with a set of tools for users, which could proliferate across other apps, he said. SoundHound hopes to become as much of a developer platform as it is a company with a core application.
Before leaving, Mohajer pulled out one last party trick: a way to play Rock Paper Scissors with Hound. He used it as an example of how quickly the company can put together new integrations (called domains) — this one took one engineer working on it for a weekend — of which he says Hound has more than 100.
“People are demanding this, users want this, users want a conversational interface,” Mohajer said. “They love the [Amazon] Echo. They are using whatever the car comes with, their cell phone comes with. The providers are really pushing it — putting ads on TV. Anyone who makes anything now is interested in a conversational interface. Whether it’s a device like a TV or a fridge or an app, they are thinking, ‘what is the next thing for my product?'”