Skip to content

Google DeepMind's game-playing AI addresses chatbot blind spot

    A few years before ChatGPT started babbling, Google developed a very different kind of artificial intelligence program: AlphaGo. This program learned to play the board game Go with superhuman skill through tireless practice.

    Researchers from the company have now published a study that combines the capabilities of a large language model (the AI ​​behind today's chatbots) with those of AlphaZero, a successor to AlphaGo that can also play chess, to solve very difficult mathematical proofs.

    Their new Frankensteinian creation, called AlphaProof, has proven its worth by tackling several problems from the 2024 International Mathematical Olympiad (IMO), a prestigious competition for high school students.

    AlphaProof uses the Gemini large language model to translate naturally formulated mathematical questions into a programming language called Lean. This provides the training material for a second algorithm to learn, through trial and error, how to find proofs that can be confirmed as correct.

    Earlier this year, Google DeepMind unveiled another mathematical algorithm called AlphaGeometry that also combines a language model with another AI approach. AlphaGeometry uses Gemini to transform geometric problems into a shape that can be manipulated and tested by a program that processes geometric elements. Google also announced a new and improved version of AlphaGeometry today.

    The researchers found that their two math programs could produce proofs for IMO puzzles as well as a silver medalist. Of the six problems, AlphaProof solved two algebra problems and one number theory problem, while AlphaGeometry solved a geometry problem. The programs solved one problem in minutes, but others took several days to solve. Google DeepMind has not disclosed how much computing power it unleashed on the problems.

    Google DeepMind calls the approach used for both AlphaProof and AlphaGeometry “neuro-symbolic” because they combine the pure machine learning of an artificial neural network, the technology that has powered most advances in AI in recent times, with the language of conventional programming.

    “What we’ve seen here is that you can take the approach that was so successful, and things like AlphaGo, and combine it with large language models and produce something that’s extremely capable,” says David Silver, the Google DeepMind researcher who led the work on AlphaZero. Silver says the techniques demonstrated with AlphaProof should in theory be extended to other areas of mathematics.

    Indeed, the research offers the opportunity to address the worst tendencies of large language models by applying logic and reasoning in a more grounded way. As wonderful as large language models can be, they often struggle to understand even basic mathematics or to reason out problems logically.

    In the future, the neural-symbolic method could provide a way for AI systems to convert questions or tasks into a form that can be reasoned about in a way that produces reliable results. OpenAI is also rumored to be working on such a system, codenamed “Strawberry.”

    There is, however, a key limitation with the systems unveiled today, as Silver acknowledges. Mathematical solutions are either correct or incorrect, leaving AlphaProof and AlphaGeometry to find their way to the right answer. Many real-world problems, such as figuring out the ideal route for a trip, have many possible solutions, and which one is ideal can be unclear. Silver says the solution for more ambiguous questions might be for a language model to try to determine what a “correct” answer is during training. “There’s a spectrum of different things that can be tried,” he says.

    Silver is also careful to note that Google DeepMind isn’t going to put human mathematicians out of a job. “We’re aiming to provide a system that can prove anything, but that’s not the end of what mathematicians do,” he says. “A big part of mathematics is posing problems and finding interesting questions to ask. You could think of this as another tool along the lines of a slide rule or a calculator or computational tools.”

    Updated 7/25/24 1:25 p.m. ET: This story has been updated to clarify how many issues AlphaProof and AlphaGeometry fixed and what types of issues they were.