Sebastien Bubeck, a machine learning researcher at Microsoft, woke up one night last September and thought about artificial intelligence – and unicorns.
Bubeck had recently received early access to GPT-4, a powerful text generation algorithm from OpenAI and an upgrade to the machine learning model that is at the heart of the wildly popular chatbot ChatGPT. Bubeck was part of a team working to integrate the new AI system into Microsoft’s Bing search engine. But he and his colleagues continued to marvel at how different GPT-4 looked from anything they’d seen before.
GPT-4, like its predecessors, had been given massive amounts of text and code and had been trained to use the statistical patterns in that corpus to predict the words that should be generated in response to a piece of text input. But to Bubeck, the system’s output seemed to do so much more than make statistically plausible estimates.
That night, Bubeck got up, went to his computer, and asked GPT-4 to draw a unicorn using TikZ, a relatively obscure programming language for generating scientific diagrams. Bubeck used a version of GPT-4 that only worked with text, not images. But the code the model presented to him, when fed into a TikZ rendering software, produced a rough but distinctly unicorn-like image, cobbled together from ovals, rectangles, and a triangle. For Bubeck, such a feat certainly required an abstract understanding of the elements of such a being. “Something new is happening here,” he says. “Maybe for the first time we have something that we could call intelligence.”
How intelligent AI is becoming – and how much trust we can have in the increasingly commonplace feeling that a piece of software is intelligent has become a pressing, almost panic-inducing question.
After OpenAI released ChatGPT last November, then powered by GPT-3, it stunned the world with its ability to write poetry and prose on a wide variety of topics, solve coding problems, and synthesize knowledge from the Internet. But awe comes with shock and concern about the potential for academic fraud, disinformation and mass unemployment — and fears that companies like Microsoft are rushing to develop technology that could prove dangerous.
Understanding the potential or risks of AI’s new capabilities means having a clear understanding of what those capabilities are — and aren’t. But while there is broad agreement that ChatGPT and similar systems give computers important new abilities, researchers are only just beginning to study this behavior and determine what happens behind the prompt.
While OpenAI has promoted GPT-4 by touting its performance on bar and medical school exams, scientists who study aspects of human intelligence say its remarkable capabilities differ from ours in crucial ways. The tendency of the models to make things up is well known, but the divergence goes deeper. And with millions of people using the technology every day and companies betting their future on it, this is a mystery of immense importance.
Sparks of discord
Bubeck and other AI researchers at Microsoft were inspired by their experiences with GPT-4 to enter the debate. A few weeks after connecting the system to Bing and launching the new chat feature, the company released a paper claiming that GPT-4 showed “sparks of artificial general intelligence” in early experiments.
The authors presented a number of examples where the system performed tasks that seem to reflect a more general intelligence, far beyond previous systems such as GPT-3. The examples show that unlike most previous AI programs, GPT-4 is not limited to a specific task, but can turn its hand to all sorts of problems – a necessary quality of general intelligence.