There can be little doubt that, just like Microsoft thinks touch is the future of computing, Google seems to believe voice will be the user interface of the future. Indeed, when I was in Mountain View earlier this month, a Google spokesperson challenged me to just use voice whenever possible on my phone.
For Google, all things voice now start with “Ok Google” or “Ok Glass.” With Android KitKat on flagship phones like the Moto X and Nexus 5, voice recognition isn’t just something you have to start with a click. It’s always listening to you and is waiting for you to talk to it.
I also went to see a screening of Google And The World Brain over the weekend, a 2013 documentary about Google’s controversial book-scanning project. The only person Google made available for the film was Amit Singhal, a Google VP and the head of its core ranking team. In the movie, Singhal doesn’t actually mention Google Books, but instead he talks about how the Star Trek computer was a major influence on his research. That, plus Google’s challenge to use voice commands whenever possible, made me think a bit more seriously about all of the work Google (and arguably Apple and others) have recently been doing around voice recognition and natural language processing.
In the early days of voice recognition and Apple’s Siri, talking to your phone or computer always felt weird. There’s just something off about talking to an inanimate object that barely understands what you want anyway. The early voice recognition tools were also so limited, it took Zen-like focus on your pronunciation and ensuring that you more or less stuck with the approved commands to get them to work. Just ask anybody who has voice recognition in their cars how much they enjoy it (just don’t ask anybody with an older Ford SYNC system, they may just throw a fit).
Solving those kinds of hard problems is what tends to motivate Google, though. As I noted a few months ago, one of Google’s missions is to build the ultimate personal assistant, and to do that, it has to perfect voice recognition and – more crucially – the natural language processing algorithms behind it.
What Google’s voice commands enable you to do on the phone (and in the Chrome browser) today, is pretty impressive. Ask it “Call Mum” and it will do that. It’ll open web pages for you, answer complex questions thanks to Google’s massive repository of data in the Knowledge Graph, set up appointment and reminders, convert currencies, translate words and phrases, and send emails and texts.
For voice searches, it’ll just speak back the answers. That’s something few companies can replicate, simply because they can’t match Google’s Knowledge Graph. More interestingly, though, many of these actions draw you into a short conversation with your phone.
“Call Alex.” “Which Alex?” “Alex Wilhelm.” “Mobile or home?” “Mobile.” “Calling Alex.”
The fact that this works, and that Google can even often recognize pronouns in extended conversations, is awesome. For now, though, this still feels weird to me. I’m not likely to use it in public anytime soon, and using it when I’m alone in my office feels even stranger.
I’m guessing that kind of hesitance will wear off over time, just like video conferencing felt weird at the beginning and now we’re all used to video chats on Skype, FaceTime and Google Hangouts.
Maybe the computer from “Her” is indeed the future of user interfaces and the natural language processing and all of the other tech that drives Google’s voice commands and search today will surely form the kernel of the artificial intelligence systems the company will one day build.
It’s no surprise Google bought the stealthy artificial intelligence startup DeepMind a few weeks ago. Its founders have long been interested in AI and Larry Page is rumored to have led the acquisition of DeepMind himself. Back in 2000, Page once said that he believed “artificial intelligence would be the ultimate version of Google.” The company continues to go down this path, and many of the researchers at Google’s semi-secret X-labs seem to have a strong interest in AI, too. The ideal user interface for working with these systems is probably speech.
My experiment with only using voice control for a day pretty much failed, however — not because it didn’t work well, but just because I just don’t feel like talking to my phone most of the time.