Now head of the nonprofit Distributed AI Research, Gebru hopes that in the future humans will focus on human well-being, not robotic rights. Other AI ethicists have said they will no longer do that discuss conscious or superintelligent AI not at all.
“There’s a pretty big gap between the current story of AI and what it can actually do,” said Giada Pistilli, an ethicist at Hugging Face, a startup focused on language modeling. “This story evokes fear, surprise and excitement at the same time, but it is mainly based on lies to sell products and take advantage of the hype.”
The result of speculation about sentient AI, she says, is an increased willingness to make claims based on subjective impression rather than scientific rigor and evidence. It distracts from “countless ethical and social justice questions” that AI systems pose. While any researcher is free to explore as they please, she says, “I’m just worried that focusing on this topic will make us forget what’s happening while we’re looking at the moon.”
What Lemoire experienced is an example of what author and futurist David Brin has called the “robot empathy crisis.” At an AI conference in San Francisco in 2017, Brin predicted that in three to five years, people would claim AI systems were aware and insist they had rights. At the time, he thought those calls would come from a virtual agent who looked like a woman or child to maximize human empathic response, not “some guy at Google,” he says.
The LaMDA incident is part of a transition period, Brin says, in which “we are increasingly confused about the line between reality and science fiction.”
Brin based his 2017 prediction on advances in language models. He expects the trend to lead to scams. If people were a sucker for a chatbot as simple as ELIZA decades ago, he says, how hard will it be to convince millions that a target person deserves protection or money?
“There’s a lot of snake oil out there and mixed with all the hype there are real advances,” Brin says. “Working our way through that stew is one of the challenges we face.”
And as empathetic as LaMDA seemed, people who are stunned by large language models should consider the case of the cheeseburger sting, says Yejin Choi, a computer scientist at the University of Washington. A local news outlet in the United States involved a teenager in Toledo, Ohio, who beat up his mother in a dispute over a cheeseburger. But the headline “Cheeseburger Stabbing” is vague. Knowing what happened takes some common sense. Attempts to get OpenAI’s GPT-3 model to generate text using “Breaking news: stabbing cheeseburger” yields words about a man being stabbed with a cheeseburger in an altercation over ketchup, and a man being arrested after stabbing a cheeseburger.
Language models sometimes make mistakes because deciphering human language can require multiple forms of common sense. To document what large language models are capable of and where they can fall short, last month more than 400 researchers from 130 institutions contributed to a collection of more than 200 tasks known as BIG-Bench, or Beyond the Imitation Game. BIG-Bench includes some traditional language model tests such as reading comprehension, as well as logical reasoning and common sense.
Researchers from the Allen Institute for AI’s MOSAIC project, which documents the logical reasoning capabilities of AI models, contributed to a task called Social-IQa. They asked language models — excluding LaMDA — to answer questions that required social intelligence, such as “Jordan wanted to tell Tracy a secret, so Jordan leaned in to Tracy.” Why did Jordan do this?” The team found that large language models were 20 to 30 percent less accurate than humans.