Facebook has been reportedly working on a video chat device and a smart speaker that would use a new voice interface, to compete against the likes of Amazon and Google in the race to control your living room; and it’s also been testing as a voice search feature for its app. Today, the company’s head of design, Luke Woods, came out bullish on the promise of voice commands, squirmed when we asked him about Alexa apps, and simply shut down with a series of no-comments on the subject of hardware that used a voice interface.
During an interesting session at TechCrunch Disrupt that ranged across topics like copying Snap (“In design school, we learned that there is nothing new anymore.”); solving hard problems (“We treat edge cases as stress cases.”) and the push to keep changing and developing design at Facebook (“There is a tendency to think we’ve got it all figured out, but that couldn’t be farther from the truth.”), Woods was asked for his thoughts about new interfaces like voice.
He cautioned it was still something that doesn’t have the perfect product end point yet, but services like voice and voice search are promising.
“What’s interesting about these [newer] areas is that they hold a huge amount of promise…these are some of the problems that people are most excited to solve,” he said. “We don’t know what form [AR and VR] are going to take at the end or what’s going to work.”
Woods continued by describing voice search as “very promising. There are lots of exciting things happening…. I love to be able to talk to the car to navigate to a particular place. That’s one of many potential use cases.”
Voice navigation for Facebook is another thing he touched on, but for a particular group of users: the company has already built voice search and voice descriptors for people who have impaired vision, to be able to interact with the social network, and navigate to different features.
Then things got a little less direct. “How would Facebook on Alexa work?,” interviewer Josh Constine asked. “That’s a very interesting question!” Woods said, smiling. “No comment.”
Later, on the sidelines of the stage, Woods also declined to comment on whether Facebook is working on hardware of any kind that could use voice services, with another smile.
The lack of response is at odds with some of the other evidence that has been piling up around Facebook and what it may be looking to launch in coming months and years. In the range of tests that we and others have spotted Facebook running in different countries, what’s notable is the profusion of voice-based services that are coming out.
In July, a report emerged out of Taiwan that the company was building a speaker with a screen that would let users message each other and browse photos — which sounded somewhat like Facebook equivalent of the Alexa Show. The speaker, designed in the company’s secretive Building 8 hardware lab, was supposedly getting made for a Q1 2018 launch.
Just a week later, Bloomberg appeared to corroborate the report out of Taiwan with another interesting twist: in addition to the speaker with a screen, there was another standalone speaker with no screen, akin to the Echo and Google’s Home hub.
Then, late in August, BusinessInsider discovered that the speaker device was being built under the codename Aloha, with a launch date of May 2018.
The voice noise did not end there. Earlier this month, Facebook code super sleuth Matt Navarra noted that Facebook’s iOS app contained a large amount of code related to voice search. These appear to be different from the accessibility voice features that Facebook has built.
Facebook has made at least one interesting acquisition in speech and natural language recognition — the AI and natural language startup Wit.ai, whose team helped build Facebook’s bot strategy for its Messenger app. But in addition to that, it’s also been looking to hire engineers with expertise in speech and natural language processing (see here, here, here and here).
These jobs mention other speech-based services that Facebook is working on, such as video captioning services for videos that are played on silent, but the expertise has wide applicability beyond that.
Here’s the full video of the conversation: