When It Comes To Advanced Math, ChatGPT Is Not A Star Student

Image of a student standing in front of a whiteboard with equations.

While learning high-level math isn’t easy, teaching math concepts can often be just as challenging. That may be why many teachers turn to ChatGPT for help. According to a recent Forbes article, 51 percent of teachers surveyed stated they had used ChatGPT to teach, while 10 percent used it daily. ChatGPT can help convey technical information in more basic terms, but it doesn’t always provide the right solution, especially for top-level math.

An international team of researchers tested what the software could handle by providing the generative AI program with challenging graduate-level math questions. While ChatGPT failed on a significant number of them, the correct answers suggested it could be useful to math researchers and teachers as a sort of specialized search engine.

Displaying ChatGPT’s computational muscles

The media tends to portray ChatGPT’s mathematical intelligence as either brilliant or incompetent. “Only the extremes have been emphasized,” explains Frieder Simon, a doctoral student at the University of Oxford and the study’s lead author. For example, ChatGPT passed Psychology Today’s Verbal-Linguistic Intelligence IQ Test with 147 points, but failed miserably on Accounting Today’s CPA exam. “There is a middle [road] for some use cases; ChatGPT performs quite well [for some students and educators]but for others not so much,” Simon explained.

At the testing level of high school and undergraduate math classes, ChatGPT performs well, ranking in the 89th percentile for the SAT math test. It even got a B on the final exam in quantum computing from technology expert Scott Aaronson.

But several tests may be needed to reveal the limits of ChatGPT’s capabilities. “One thing the media has focused on is ChatGPT’s ability to pass several popular standardized tests,” said Leah Henrickson, a professor of digital media at the University of Leeds. “These are tests that students literally spend years preparing for. We are often led to believe that these tests evaluate our intelligence, but more often than not, they evaluate our ability to remember facts. ChatGPT can pass these tests because it can remember facts it picked up during its training.

Simon and his research team proposed a unique set of top-level math questions to assess whether ChatGPT also had testing and problem-solving skills. “[Previous studies looked at] whether the output was correct or incorrect,’ added Simon. “And we wanted to go further and have implemented a much finer methodology that allows us to really assess how ChatGPT fails, if it fails, and how it fails.” To create a more complex testing system, the researchers gathered prompts from various fields into a larger set of problems they called GHOSTS.

When it comes to advanced math, ChatGPT is not a star student

Displaying ChatGPT’s computational muscles