A new “empathic speech interface” launched today by New York startup Hume AI enables the addition of a range of emotionally expressive voices, plus an emotionally attuned ear, to large language models from Anthropic, Google, Meta, Mistral and OpenAI — heralding an era when AI helpers may come to gush about us more often.
“We specialize in building empathetic personas that speak in ways that humans would speak, rather than stereotypes of AI assistants,” says Hume AI co-founder Alan Cowen, a psychologist who has co-authored a number of research papers on AI and emotion, and who previously worked on emotional technologies at Google and Facebook.
WIRED tested Hume’s latest speech technology, called EVI 2, and found that its output was similar to that developed by OpenAI for ChatGPT. (When OpenAI gave ChatGPT a flirty voice in May, the company’s CEO Sam Altman praised the interface as feeling “like AI from the movies.” Later, a real-life movie star, Scarlett Johansson, claimed that OpenAI had stolen her voice.)
Like ChatGPT, Hume is much more emotionally expressive than most conventional voice interfaces. For example, if you tell it that your pet has died, it will take on an appropriately somber and sympathetic tone. (And like ChatGPT, you can interrupt Hume mid-speech, at which point it will pause and adapt with a new response.)
OpenAI didn’t say how much of its voice interface attempts to gauge user emotions, but Hume’s is explicitly designed to do so. During interactions, Hume’s developer interface will display values that indicate metrics for things like “determination,” “fear,” and “happiness” in the user’s voice. If you speak to Hume in a sad tone, it will pick that up too, something ChatGPT doesn’t seem to do.
Hume also makes it easy to assign a voice with specific emotions by adding a prompt to the UI. Here it is when I asked to make it “sexy and flirty”:
And when you're told to be “sad and gloomy”:
And this is the particularly nasty message you get when you are asked to be “angry and rude”:
The technology didn't always seem so polished and smooth like OpenAI's, and it occasionally behaved in strange ways. For example, at one point the voice suddenly sped up and spewed gibberish. But if the voice can be refined and made more reliable, it has the potential to make human voice interfaces more general and varied.
The idea of recognizing, measuring, and simulating human emotions in technological systems goes back decades and is studied in a field known as “affective computing,” a term coined in the 1990s by Rosalind Picard, a professor at the MIT Media Lab.
Albert Salah, a professor at Utrecht University in the Netherlands who studies affective computing, is impressed by Hume AI's technology and recently demonstrated it to his students. “What EVI seems to do is assign emotional valence and arousal values [to the user]and then modulate the agent's speech accordingly,” he says. “It's a very interesting twist on LLMs.”