They say that news is what happens to a writer on his way to the bathroom and I’ve recently discovered something fascinating. Alexa – and, to a degree, Siri and Google’s OK Google solution – have become indispensable additions to our home. The kids tell Alexa to turn on the lights and start Netflix. We ask her how to spell words and do basic math. She tells us which day it is and what the weather will be. She sets timers for us and reminds us to buy milk. In short, she’s our own special helper monkey.
“￼Forget the onerous process of pulling your Pixel or iPhone from your pocket, unlocking it, opening apps, and tapping your desires onto a screen (Ugh!),” wrote Jessi Hempel on Backchannel. “Soon, you’ll speak your wants into the air — anywhere — and a woman’s warm voice with a mid-Atlantic accent will talk back to you, ready to fulfill your commands.”
The world thought it wanted smart watches but what it really wants is to be heard. And Alexa and her ilk are only going to get more and more powerful.
Analysts estimate that Amazon has sold six million Alexa-cabable since launch. This is a big enough number to make the Echo a fun addition to holiday festivities. Take my own home, for example. My parents fell in love with our Echo after asked it to play Roy Orbison for my Dad and my Mom asked it whether or not we’d need an umbrella. When I ordered one for them for Christmas my sister took it so I had to buy them another one. Like most nascent tech there is little impetus for the non-techie to buy or use aural interfaces outright but once they see how it works they’re hooked.
The future of aural interfaces is clear. Our phones will soon be talking to us more and more. By slapping in some micro-earbuds – the AirBuds are making more and more sense now – and a smarter interface you can easily go your entire day without having to unlock your phone. Notifications can be read to you in a hushed voice. Top Twitter trends can hit your cranium while you’re driving. Add in a camera or NFC and you can tell which stores have sales or get details on who you’re talking to like some kind of friendly White House aide. Once you can start talking to your phone more than you talk to humans we enter a world in which the ears and not the eyes become the sensory organ of choice.
This interface is obviously interstitial. Like most technologies we’ll bump up against the edges of a voice interface fairly quickly. Until we are able to “jack” directly into our computers for some real augmented reality, however, aural interfaces are nearly the next best thing. Voice interfaces are unobtrusive and seamless – you don’t need to know anything to talk to the Echo but the even the simplest phone requires some kind of literacy – and cellphones and cloud services are getting better and better. By melding the two we find ourselves at a perfect inflection point for the rise of voice and the fall of hunting and pecking on a glowing phone.
What we most want from our devices is freedom. We want to be able to tell them to do the things we’re thinking and get immediate results. Turning off the lights in my home takes four physical taps on my phone or one sentence to Alexa. Turning on the Star Wars theme music takes five taps on my phone or one request to Alexa. Getting the answer to “seven times eight” (we have small kids) takes a solid six seconds of tapping or two seconds of talking. Once Alexa and other bots become ubiquitous we’ll all be shouting commands into the air and expecting our homes to react.
Unfortunately our robotic friends aren’t quite as smart as I’d like them to be. We’ve been talking about Star Wars around the house these last few days and I pointed out the Wilhelm Scream to the kids. I asked Alexa for a bit more information. She in, her wisdom, pointed me to A Wilhelm Scream, a hardcore band from New Bedford, Massachusetts. I can imagine – nay, I crave – a world in which my home voice-controlled robot howls in pain over and over while we discuss the vagaries of sound design and inside jokes. A cyborg can dream.