AI Models Start Learning By Asking Themselves Questions

Even the smartest Artificial intelligence models are essentially copycats. They learn by consuming examples of human work or by attempting to solve problems presented to them by human instructors.

But perhaps AI can learn in a more human way – by coming up with interesting questions to ask themselves and trying to find the right answer. A project by Tsinghua University, the Beijing Institute for General Artificial Intelligence (BIGAI) and Pennsylvania State University shows that AI can learn to reason in this way by playing with computer code.

The researchers devised a system called Absolute Zero Reasoner (AZR) that first uses a large language model to generate challenging but solvable Python coding problems. It then uses the same model to solve these problems before checking its work by attempting to execute the code. And finally, the AZR system uses successes and failures as a signal to refine the original model, increasing its ability to both pose and solve better problems.

The team found that their approach significantly improved the coding and reasoning skills of both 7 billion and 14 billion parameter versions of the open source language model Qwen. Impressively, the model even outperformed some models fed human-curated data.

I spoke with Andrew Zhao, a PhD student at Tsinghua University who came up with the original idea for Absolute Zero, and Zilong Zheng, a researcher at BIGAI who worked with him on the project, via Zoom.

Zhao told me the approach is similar to how human learning goes beyond rote memorization or imitation. “At first you imitate your parents and like your teachers, but then you actually have to ask your own questions,” he said. “And eventually you can surpass those who taught you in school.”

Zhao and Zheng noted that the idea of AI learning in this way, also called “self-play,” goes back years and was previously explored by people like Jürgen Schmidhuber, a noted AI pioneer, and Pierre-Yves Oudeyer, a computer scientist at Inria in France.

One of the most exciting elements of the project, according to Zheng, is the way the model's problem-posing and problem-solving skills scale. “The difficulty increases as the model becomes more powerful,” he says.

A key challenge is that for now the system only works on problems that can be easily checked, such as those involving math or coding. As the project progresses, it may be possible to use it for agentic AI tasks such as web browsing or office chores. This may involve the AI model trying to judge whether an agent's actions are correct.

A fascinating possibility of an approach like Absolute Zero is that it could theoretically enable models to go beyond human learning. “Once we get that, it's kind of a way to achieve superintelligence,” Zheng told me.

There are early signs that the Absolute Zero approach is catching on at some major AI labs.

A project called Agent0, from Salesforce, Stanford and the University of North Carolina at Chapel Hill, involves an agent that uses software tools and improves itself through self-play. As with Absolute Zero, the model improves at general reasoning through experimental problem solving. A recent paper written by researchers from Meta, the University of Illinois, and Carnegie Mellon University presents a system that uses a similar kind of self-play for software engineering. The authors of this work suggest that it is “a first step toward training paradigms for superintelligent software agents.”

Finding new ways for AI to learn will likely be a big theme in the tech industry this year. As conventional data sources become scarcer and more expensive, and as labs look for new ways to make models more capable, a project like Absolute Zero could lead to AI systems that look less like copycats and more like humans.

AI models start learning by asking themselves questions