Alexa has come a long way in four years. While Siri may have put digital voice assistants on the map, Alexa is the one that's become the useful flagship β especially when it comes to powering the smart home.
Alexa isn't perfect though. It's still clear that there's a long way to go before voice assistants are as seamless as possible and truly indispensable. Amazon has a roadmap to get there, and it has a couple of key ideas it keeps in mind as it works on making Alexa better.
The most important of them is context, something Amazon shouted from the rooftops as it announced the new Echo Dot and Echo Show back at its September event. The first glimpses of this are Whisper Mode, which allows Alexa to whisper to you when you whisper to it, and Alexa Guard, which can listen for shattered glass so it can act as a security system.
If you want a sneak peek of what Alexa will do in the future, all you have to do is take a look at yourself. Toni Reid, Amazon's VP of Alexa Experience, tells The Ambient that one of the things Amazon does when designing features for Alexa is to ask "what would a human do?"
"You kind of have to put yourself in a different mindset and let the technology go away for a while and just think about 'how do we design this and what would a human do in this instance?'" she says.
Making Alexa human
If someone whispers to you, you naturally understand they're doing it for a reason. The actual reason doesn't matter too much, but you instantly recognise the context and start whispering back. Creating that human feel is what Amazon wants to do with Alexa.
But actually creating that human-like feel is difficult. For Whisper Mode, Amazon's scientists had to go in and figure out how human vocal chords work when people are whispering. They found out that whispering doesn't involve the vibration of the vocal chords, and that it uses less energy in the lower frequencies of the human voice, according to Amazon speech scientist Zeynab Raeesy. They had to teach Alexa to figure out those signifiers and adjust to them.
The problem is similar for Alexa Guard, which needs to be able to isolate sounds and identify them. It needs to be able to figure out that a dropped plate or a dropped drinking glass is not the same as a shattered window pane. Chieh-Chi Kao and Weiran Wang, Amazon applied scientists, say they had to use machine learning to analyse 30-second snippets of audio to help teach Alexa.
Amazon can make Alexa as contextual as it likes, but one of the more frustrating things about Alexa β and voice assistants in general β is how they can feel rigid when using third-party skills. You often have to say things in a non-human way, like, "Alexa, ask iRobot Home to clean the kitchen" when you just want to say, "Alexa, clean the kitchen." One way to do this is for Amazon to give developers access to Alexa's more advanced abilities via APIs, which are basically keys that allow developers to use another company's tech.
"It's super important to get those APIs out and get that experience distributed so that thousands of developers can invent and have some of the experiences that we do," Reid says. The more access developers have to Alexa and its contextual features, the better their skills, which means Alexa could sound and act less robotic.
One way Amazon is doing this is with a set of tools called Speech Synthesis Language Markup, which allows developers to control the pitch and volume of Alexa's voice. That means that developers could make Alexa whisper or pause for dramatic effect, or slow its rate of speech.
A new responsibility
The more contextual Alexa feels through the experience β from Amazon's own features to third-party abilities β the more human-like Alexa feels. That also means that Amazon's responsibility grows. Reid says people tell Alexa things about their lives
"Customers tell us things. They're sad, they're happy. And how we respond to those things β it matters. We take that responsibility across everything," she says. If someone starts confiding about how they're depressed or feeling lonely or even suicidal to Alexa, Amazon needs to be able to react.
Amazon does do some of this. If you tell Alexa that you're going to kill yourself, it will recommend that you call a suicide prevention hotline to get help. However, it can't understand the more nuanced signs of depression and suicide. Alexa can't understand the emotion behind what you're saying yet, so it can't parse whether you're joking around or being serious if you say something like, "Alexa, I don't want to wake up tomorrow."
Whisper Mode's ability to parse not what you're saying, but how you're saying it, is just one step toward Alexa's ability to parse emotion, which will help it understand intent and help more than it can.
At the same time, Alexa is a piece of technology run by a series of algorithms. As we've seen in 2018, algorithms can go wrong and have unintended side effects. Reid says no matter if people see Alexa as a member of their family or just a voice computer using the cloud, Amazon needs to be transparent about how it does things.
"For us it's about giving customers transparency and control. They need to understand why things are happening. So with Hunches you should be able to say, 'why did you do that?' Really, the more people understand why we're doing things and that they have control over those things is I think super important, and it allows that relationship to be developed and super individualised."