End of July, OpenAI began rolling out an eerily human voice interface for ChatGPT. In a security analysis released today, the company acknowledges that this anthropomorphic voice could entice some users to become emotionally attached to its chatbot.
The warnings are contained in a “system map” for GPT-4o, a technical document that outlines what the company believes are the risks associated with the model, plus details safety testing and the steps the company is taking to mitigate potential risks.
OpenAI has come under scrutiny in recent months after a number of employees who worked on the long-term risks of AI left the company. Some subsequently accused OpenAI of taking unnecessary risks and silencing dissenters in the race to commercialize AI. Revealing more details about OpenAI’s security regime could help mitigate the criticism and reassure the public that the company is taking the issue seriously.
The risks explored in the new system map are broad, and include the possibility that GPT-4o could amplify societal biases, spread disinformation, and aid in the development of chemical or biological weapons. It also reveals details of tests designed to ensure that AI models don’t try to escape their controls, mislead humans, or hatch catastrophic plans.
Some outside experts praise OpenAI for its transparency, but say it can go further.
Lucie-Aimée Kaffee, an applied policy researcher at Hugging Face, a company that hosts AI tools, notes that OpenAI’s systems map for GPT-4o doesn’t provide extensive details about the model’s training data or who owns that data. “The question of consent when creating such a large dataset that spans multiple modalities, including text, image, and speech, needs to be addressed,” Kaffee says.
Others note that risks can change as tools are deployed in the wild. “Their internal assessment should only be the first part of ensuring AI safety,” says Neil Thompson, a professor at MIT who studies AI risk assessments. “Many risks only materialize when AI is deployed in the real world. It’s important that these other risks are catalogued and evaluated as new models emerge.”
The new system map highlights how quickly AI risks are evolving with the development of powerful new features like OpenAI’s voice interface. In May, when the company unveiled its voice mode, which can respond quickly and handle interruptions back and forth in a natural way, many users found it came across as overly flirtatious in demos. The company later drew criticism from actress Scarlett Johansson, who accused it of copying her speech style.
A section of the system map titled “Anthropomorphization and Emotional Dependence” explores problems that arise when users perceive AI in human terms, something that is apparently exacerbated by the human voice mode. For example, during the red teaming, or stress test, of GPT-4o, OpenAI researchers noted instances of user speech that conveyed a sense of emotional connection to the model. For example, people used language like “This is our last day together.”
Anthropomorphism can make users more confident in a model’s output when it “hallucinates” incorrect information, OpenAI says. Over time, it can even affect users’ relationships with other people. “Users may form social relationships with the AI, reducing their need for human interaction—potentially benefiting lonely individuals but potentially affecting healthy relationships,” the paper says.
Joaquin Quiñonero Candela, head of readiness at OpenAI, says that voice mode could develop into a uniquely powerful interface. He also notes that the kind of emotional effects seen with GPT-4o could be positive, for example by helping people who are lonely or who need to practice social interactions. He adds that the company will be closely studying anthropomorphism and emotional connections, including by monitoring how beta testers interact with ChatGPT. “We don’t have any results to share at this point, but it’s on our list of concerns,” he says.