A group behind Stable Diffusion wants to open source emotion-detecting AI

In 2019, Amazon upgraded its Alexa assistant with a feature that enabled it to detect when a customer was likely frustrated — and respond with proportionately more sympathy. If a customer asked Alexa to play a song and it queued up the wrong one, for example, and then the customer said “No, Alexa” in an upset tone, Alexa might apologize — and request a clarification.

Now, the group behind one of the data sets used to train the text-to-image model Stable Diffusion wants to bring similar emotion-detecting capabilities to every developer — at no cost.

This week, LAION, the nonprofit building image and text data sets for training generative AI, including Stable Diffusion, announced the Open Empathic project. Open Empathic aims to “equip open source AI systems with empathy and emotional intelligence,” in the group’s words.

“The LAION team, with backgrounds in healthcare, education and machine learning research, saw a gap in the open source community: emotional AI was largely overlooked,” Christoph Schuhmann, a LAION co-founder, told TechCrunch via email. “Much like our concerns about non-transparent AI monopolies that led to the birth of LAION, we felt a similar urgency here.”

Through Open Empathic, LAION is recruiting volunteers to submit audio clips to a database that can be used to create AI, including chatbots and text-to-speech models, that “understands” human emotions.

“With Open Empathic, our goal is to create an AI that goes beyond understanding just words,” Schuhmann added. “We aim for it to grasp the nuances in expressions and tone shifts, making human-AI interactions more authentic and empathetic.”

LAION, an acronym for “Large-scale Artificial Intelligence Open Network,” was founded in early 2021 by Schuhmann, who’s a German high school teacher by day, and several members of a Discord server for AI enthusiasts. Funded by donations and public research grants, including from AI startup Hugging Face and Stability AI, the vendor behind Stable Diffusion, LAION’s stated mission is to democratize AI research and development resources — starting with training data.

“We’re driven by a clear mission: to harness the power of AI in ways that can genuinely benefit society,” Kari Noriy, an open source contributor to LAION and a PhD student at Bournemouth University, told TechCrunch via email. “We’re passionate about transparency and believe that the best way to shape AI is out in the open.”

Hence Open Empathic.

For the project’s initial phase, LAION has created a website that tasks volunteers with annotating YouTube clips — some pre-selected by the LAION team, others by volunteers — of an individual person speaking. For each clip, volunteers can fill out a detailed list of fields, including a transcription for the clip, an audio and video description and the person in the clip’s age, gender, accent (e.g. “British English”), arousal level (alertness — not sexual, to be clear) and valence level (“pleasantness” versus “unpleasantness”).

Other fields in the form pertain to the clip’s audio quality and the presence (or absence) of loud background noises. But the bulk focus is on the person’s emotions — or at least, the emotions that volunteers perceive them to have.

From an array of drop-down menus, volunteers can select individual — or multiple — emotions ranging from “chirpy,” “brisk” and “beguiling” to “reflective” and “engaging.” Noriy says that the idea was to solicit “rich” and “emotive” annotations while capturing expressions in a range of languages and cultures.

“We’re setting our sights on training AI models that can grasp a wide variety of languages and truly understand different cultural settings,” Noriy said. “We’re working on creating models that ‘get’ languages and cultures, using videos that show real emotions and expressions.”

Once volunteers submit a clip to LAION’s database, they can repeat the process anew — there’s no limit to the number of clips a single volunteer can annotate. LAION hopes to gather roughly 10,000 samples over the next few months, and — optimistically — between 100,000 to 1 million by next year.

“We have passionate community members who, driven by the vision of democratizing AI models and data sets, willingly contribute annotations in their free time,” Noriy said. “Their motivation is the shared dream of creating an empathic and emotionally intelligent open source AI that’s accessible to all.”

The pitfalls of emotion detection

Aside from Amazon’s attempts with Alexa, startups and tech giants alike have explored developing AI that can detect emotions — for purposes ranging from sales training to preventing drowsiness-induced accidents.

In 2016, Apple acquired Emotient, a San Diego firm working on AI algorithms that analyze facial expressions. Snatched up by Sweden-based Smart Eye last May, Affectiva — an MIT spin-out — once claimed its technology could detect anger or frustration in speech in 1.2 seconds. And speech recognition platform Nuance, which Microsoft purchased in April 2021, has demoed a product for cars that analyzes driver emotions from their facial cues.

Other players in the budding emotion detection and recognition space include Hume, HireVue and Realeyes, whose technology is being applied to gauge how certain segments of viewers respond to certain ads. Some employers are using emotion-detecting tech to evaluate potential employees by scoring them on empathy and emotional intelligence. Schools have deployed it to monitor students’ engagement in the classroom — and remotely at home. And emotion-detecting AI has been used by governments to identify “dangerous people” and tested at border control stops in the U.S., Hungary, Latvia and Greece.

The LAION team envisions, for their part, helpful, unproblematic applications of the tech across robotics, psychology, professional training, education and even gaming. Schuhmann paints a picture of robots that offer support and companionship, virtual assistants that sense when someone feels lonely or anxious and tools that aid in diagnosing psychological disorders.

It’s a techno utopia. The problem is, most emotion detection is on shaky scientific ground.

Few, if any, universal markers of emotion exist — putting the accuracy of emotion-detecting AI into question. The majority of emotion-detecting systems were built on the work of psychologist Paul Ekman, published in the ’70s. But subsequent research — including Ekman’s own — supports the common-sense notion that there’s major differences in the way people from different backgrounds express how they’re feeling.

For example, the expression supposedly universal for fear is a stereotype for a threat or anger in Malaysia. In one of his later works, Ekman suggested that American and Japanese students tend to react to violent films very differently, with Japanese students adopting “a completely different set of expressions” if someone else is in the room — particularly an authority figure.

Voices, too, cover a broad range of characteristics, including those of people with disabilities, conditions like autism and who speak in other languages and dialects such as African-American Vernacular English (AAVE). A native French speaker taking a survey in English might pause or pronounce a word with some uncertainty — which could be misconstrued by someone unfamiliar as an emotion marker.

Indeed, a big part of the problem with emotion-detecting AI is bias — implicit and explicit bias brought by the annotators whose contributions are used to train emotion-detecting models.

In a 2019 study, for instance, scientists found that labelers are more likely to annotate phrases in AAVE more toxic than their general American English equivalents. Sexual orientation and gender identity can heavily influence which words and phrases an annotator perceives as toxic as well — as can outright prejudice. Several commonly used open source image data sets have been found to contain racist, sexist and otherwise offensive labels from annotators.

The downstream effects can be quite dramatic.

Retorio, an AI hiring platform, was found to react differently to the same candidate in different outfits, such as glasses and headscarves. In a 2020 MIT study, researchers showed that face-analyzing algorithms could become biased toward certain facial expressions, like smiling — reducing their accuracy. More recent work implies that popular emotional analysis tools tend to assign more negative emotions to Black men’s faces than white faces.

Respecting the process

So how will the LAION team combat these biases — making certain, for instance, that white people don’t outnumber Black people in the data set; that nonbinary people aren’t assigned the wrong gender; and that those with mood disorders aren’t mislabeled with emotions they didn’t intend to express?

It’s not totally clear.

Schuhmann claims the training data submission process for Open Empathic isn’t an “open door” and that LAION has systems in place to “ensure the integrity of contributions.”

“We can validate a user’s intention and consistently check for the quality of annotations,” he added.

But LAION’s previous data sets haven’t exactly been pristine.

Some analyses of LAION ~400M — a LAION image training set, which the group attempted to curate with automated tools — turned up photos depicting sexual assault, rape, hate symbols and graphic violence. LAION ~400M is also rife with bias, for example returning images of men but not women for words like “CEO” and pictures of Middle Eastern Men for “terrorist.”

Schuhmann’s placing trust in the community to serve as a check this go-around.

“We believe in the power of hobby scientists and enthusiasts from all over the world coming together and contributing to our data sets,” he said. “While we’re open and collaborative, we prioritize quality and authenticity in our data.”

As far as how any emotion-detecting AI trained on the Open Empathic data set — biased or no — is used, LAION is intent on upholding its open source philosophy — even if that means the AI might be abused.

“Using AI to understand emotions is a powerful venture, but it’s not without its challenges,” Robert Kaczmarczyk, a LAION co-founder and physician at the Technical University of Munich, said via email. “Like any tool out there, it can be used for both good and bad. Imagine if just a small group had access to advanced technology, while most of the public was in the dark. This imbalance could lead to misuse or even manipulation by the few who have control over this technology.”

Where it concerns AI, laissez faire approaches sometimes come back to bite model’s creators — as evidenced by how Stable Diffusion is now being used to create child sexual abuse material and nonconsensual deepfakes.

Certain privacy and human rights advocates, including European Digital Rights and Access Now, have called for a blanket ban on emotion recognition. The EU AI Act, the recently enacted European Union law that establishes a governance framework for AI, bars the use of emotion recognition in policing, border management, workplaces and schools. And some companies have voluntarily pulled their emotion-detecting AI, like Microsoft, in the face of public blowback.

LAION seems comfortable with the level of risk involved, though — and has faith in the open development process.

“We welcome researchers to poke around, suggest changes, and spot issues,” Kaczmarczyk said. “And just like how Wikipedia thrives on its community contributions, Open Empathic is fueled by community involvement, making sure it’s transparent and safe.”

Transparent? Sure. Safe? Time will tell.