New Affectiva cloud API helps machines understand emotions in human speech

Affectiva, the startup that spun out of the MIT Media Lab several years ago with tools designed to understand facial emotions, announced a new cloud API today that can detect a range of emotion in human speech.

When we speak, our voices offer subtle and not so subtle cues about our emotions. Whether our voices are tight or loud or soft can give valuable clues about our feelings. Humans can sometimes (although not always) detect those emotions, but traditionally computers have not been very good at it.

Alexa isn’t terribly funny because the technology doesn’t understand humor or tone, and can’t understand when you’re joking versus asking a genuine question. Using Affectiva’s new tech, voice assistants, bots and other devices that operate using artificial intelligence might soon be able to hear and understand our emotions — and be able to derive more meaning from our requests, company CEO and co-founder Dr. Rana el Kaliouby told TechCrunch.

“Amazon [and other companies] knows if it wants it to be persuasive to try a product or route, it needs to have a relationship [with you]. To have a relationship, it needs to understand your emotional state, which is what humans do, have a real-time understanding of an emotional state. Are you annoyed, frustrated, confused?,” Kaliouby explained.

Amazon isn’t alone. Car makers are interested in knowing your emotional state behind the wheel, and that of your passengers. These factors could have an impact on your safety in the car. Any company could use a better understanding of customers calling into their call centers or dealing with a customer service bot (they would find me often annoyed).

About a year ago, the company decided to begin studying how a machine might be able to detect an emotion based on the quality of the spoken voice. This is no easy task. There are different languages and a variety of cultural cues, which aren’t necessarily standard from country to country and culture to culture.

The company has been collecting data in the public domain and from its own data sets related to the emotional facial recognition research from around the world. They have teams of people listening to each test subject and identifying the emotion. To avoid bias, each labeler goes through a training program, and for each item in the test set, at least three of five testers have to agree on the emotional state, she said.

Affectiva understands that the data they have gathered to this point is only the beginning. Today’s announcement around the API is also about getting partners to help push this work further along. “We are starting with a crowd-based API because we are looking for data partners interested in partnering around data and emotion classifiers,” she said.

All of this research is great in theory, but there are also many ethical questions related to machines detecting our emotions in our faces and our speech, and Kaliouby understands this. They have strict guidelines about gathering information and how they use it.

They are also running a one-day Emotion AI Summit today in Cambridge at the MIT Media Lab where a variety of speakers will discuss the implications for this kind of technology on society.