OpenAI made the latest major breakthrough in artificial intelligence by increasing the size of its models to staggering proportions when it introduced GPT-4 last year. The company today announced a new advancement that signals a shift in approach: a model that can logically “reason” through many difficult problems and be significantly smarter than existing AI without requiring major scale-ups.
The new model, called OpenAI-o1, can solve problems that bedevil existing AI models, including OpenAI’s most powerful existing model, GPT-4o. Instead of producing an answer in a single step, as a large language model normally does, it reasons through the problem, effectively thinking out loud like a person would, before arriving at the correct result.
“This is what we consider the new paradigm in these models,” OpenAI Chief Technology Officer Mira Murati tells WIRED. “It’s much better at tackling very complex reasoning tasks.”
The new model, codenamed Strawberry within OpenAI, is not a successor to GPT-4o, but rather a complement to it, the company said.
Murati says OpenAI is currently building its next master model, GPT-5, which will be significantly larger than its predecessor. But while the company still believes that scale will unlock new capabilities in AI, GPT-5 will likely also incorporate the reasoning technology introduced today. “There are two paradigms,” Murati says. “The scaling paradigm and this new paradigm. We expect to bring them together.”
LLMs typically conjure their answers from huge neural networks fed vast amounts of training data. They can display remarkable linguistic and logical skills, but traditionally struggle with surprisingly simple problems such as rudimentary mathematical questions that require reasoning.
Murati says OpenAI-o1 uses reinforcement learning, which involves giving a model positive feedback when it gives correct answers and negative feedback when it gives incorrect answers, to improve its reasoning. “The model sharpens its thinking and refines the strategies it uses to arrive at the answer,” she says. Reinforcement learning has enabled computers to play games with superhuman ability and perform useful tasks, such as designing computer chips. The technique is also a key ingredient in turning an LLM into a useful, well-behaved chatbot.
Mark Chen, vice president of research at OpenAI, demonstrated the new model to WIRED and used it to solve several problems that the previous model, GPT-4o, couldn’t. These included an advanced chemistry question and the following mind-boggling math puzzle: “A princess is as old as the prince will be when the princess is twice as old as the prince was when the princess’s age was half the sum of their current ages. What is the age of the prince and princess?” (The correct answer is that the prince is 30 and the princess is 40.)
“The [new] “The model teaches how to think independently, rather than trying to imitate the way people think,” as a conventional LLM does, Chen says.
OpenAI says the new model performs significantly better on a number of problem sets, including those focused on coding, mathematics, physics, biology and chemistry. On the American Invitational Mathematics Examination (AIME), a test for math students, GPT-4o solved an average of 12 percent of the problems, while o1 got 83 percent correct, the company said.