Alexa is learning to have more natural conversations. Amazon today announced it’s rolling out the new “Conversation Mode” feature to its Echo Show 10 (3rd Gen.) devices which allows the virtual assistant to engage in free-flowing conversations that don’t require you to say the wake word, “Alexa.” This mode is enabled and disabled by the user via voice commands, so it can be something you only turn on as needed.
The company introduced Alexa Conversations alongside other AI developments at its hardware event last year, where Amazon VP and head scientist Rohit Prasad demoed new Alexa capabilities like its more personalized answers, ability to ask clarifying questions and ability to take natural turns in a conversation.
These types of interactions are easy enough for humans, but present significant challenges for an AI.
At its event, Amazon showed off how Conversation Mode could work when two people talked about ordering a pizza.
After enabling the feature by saying, “Alexa, join our conversation,” the people discussed their pizza order, at times talking over the virtual assistant. When Alexa landed on the pizza topping they liked, a person said “that one!” and Alexa adjusted the order. Alexa also appeared to understand which questions were meant for it versus those that were part of the conversation between the two people, like “do you think a medium is going to be enough?,” for example. Then, when one person said they weren’t that hungry and wanted a smaller pizza, Alexa automatically changed the order.
The company explains Amazon uses a combination of visual and acoustic cues to recognize when customer speech is being directed at the device and whether a reply is expected. This can be a very difficult problem for an AI, as many questions could be meant for either a device or a person, Amazon explains today — like “How about a comedy?” in a conversation about movie selection.
In addition, a conversational mode feature would need to have low latency in order to more accurately detect the start of an utterance meant for Alexa. (Typically, a wake word triggers Alexa to listen.)
Amazon says it developed a method for visual device directedness by estimating the head orientation of each person in the device’s field of view.
“We trained a deep-neural-network model to infer the coefficients of the templates for a given input image and to determine the orientation of the head in the image,” the company shares in an Amazon Science blog post, offering a high-level view of the AI technology. “Then we quantized the weights of the model, to reduce its size and execution time. In our experiments, this approach reduced the false-rejection rate (FRR) for visual device directedness detection by almost 80% relative to the [standard perspective-n-point] approach.”
Amazon also uses an audio-based device voice activity detection (DVAD) model to process the audio cues that signal whether or not Alexa should respond to speech it’s hearing. By adding this to the visual-only mode, Amazon was able to reduce false wakes due to ambient noise by 80% and cut down false wakes triggered by Alexa’s own responses by 42%, without increasing latency, it says.
To use Conversation Mode, users can say, “Alexa, join the conversation.” When enabled, there’s a solid blue border around the Echo Show 10 screen, and a light blue bar at the bottom of the screen, which lets you know when your requests are being sent to the cloud. When you’re finished, you can exit by saying, “Leave the conversation.”
Alexa will also automatically exit the mode if there’s no more interaction for a short period of time.
The company has been working on this conversational development for some time.
In July 2020, it presented a beta version of an Alexa Conversations feature to Alexa Skills developers, to help them create voice apps that allow for more natural-feeling conversations where people can talk to Alexa in a “less constrained way,” using the phrases they prefer. Before this, Amazon had developed a feature called Follow-Up Mode, that allowed people to give their Alexa smart device multiple commands at the same time, without having to say “Alexa” each time.
While the new Conversation Mode technology was announced last year, Amazon tells TechCrunch it’s officially launching today and the Echo Show 10 is the first device to receive it.