Amazon’s Alexa is going to sound more human. The company announced this week the addition of a new set of speaking skills for the virtual assistant, which will allow her to do things like whisper, take a breath to pause for emphasis, adjust the rate, pitch and volume of her speech, and more. She’ll even be able to “bleep” out words – which may not be all that human, actually, but is certainly clever.
These new tools were provided to Alexa app developers in the form of a standardized markup language called Speech Synthesis Markup Language, or SSML, which will let them code Alexa’s speech patterns into their applications. This will allow for the creation of voice apps – “Skills” on the Alexa platform – where developers can control the pronunciation, intonation, timing and emotion of their Skill’s text responses.
Alexa today already has a lot of personality – something that can help endear people to their voice assistants. Having taken a note from how Apple’s Siri surprises people with her humorous responses, Alexa responds to questions about herself, tells jokes, answers to “I love you,” and will even sing you a song if you ask. But her voice can still sound robotic at times – especially if she’s reading out longer phrases and sentences where there should be natural breaks and changes in tone.
As Amazon explains, developers could have used these new tools to make Alexa talk like E.T., but that’s not really the point. To ensure developers make use of the tools as intended – to humanize Alexa’s speaking patterns – Amazon has set limits on the amount of change developers are able to apply to the rate, pitch, and volume. (There will be no high-pitched, squeaks and screams, I guess.)
In total, there are five new SSML tags that can be put into practice, including whispers, expletive beeps, emphasis, sub (which lets Alexa say something other than what’s written), and prosody. That last one is about controlling the volume, pitch and rate of speech.
To show how these changes could work in a real Alexa app, Amazon created a quiz game template that uses the new tags, but can also be modified by developers to test out Alexa’s new voice tricks.
In addition to the tags, Amazon also introduced “speechcons” to developers in the U.K. and Germany. These are special words and phrases that Alexa knows to express in a more colorful way to make her interactions engaging and personal. Some speechcons were already available in the U.S., for a number of words, like “abracadabra!,” “ahem,” “aloha,” “eureka!,” “gotcha,” “kapow,” “yay,” and many more.
But with their arrival in the new markets, Alexa Skill developers can use regionally specific terms such as “Blimey” and “Bob’s your uncle,” in the U.K. and “Da lachen ja die Hühner” and “Donnerwetter” in Germany.
There are now over 12,000 Alexa Skills on the marketplace but it’s unknown how many developers will actually put the new voice tags to work.
After all, this humanization of Alexa relies on having an active developer community. And that’s something that requires Amazon to do more than build out clever tricks to be put to use – it has to be able to support an app economy, where developers don’t just build things for fun, but because there are real businesses that can be run atop Amazon’s voice computing infrastructure.