
Measured sycophancy rates on the BrokenMath benchmark. Lower is better.
Measured sycophancy rates on the BrokenMath benchmark. Lower is better.
Credit: Petrov et al
GPT-5 also showed the best “utility” of all tested models, solving 58 percent of the original problems despite the errors introduced in the modified theorems. Overall, however, LLMs also showed more sycophancy when the original problem proved more difficult to solve, the researchers found.
While hallucinating proofs for false theorems is clearly a major problem, the researchers also caution against using LLMs to generate new theorems for AI solving. During testing, they discovered that these types of use cases lead to a kind of “self-sycophancy,” where models are even more likely to generate false proofs for invalid theorems they invented.
No, of course you're not the asshole
While benchmarks like BrokenMath attempt to measure LLM sycophancy when facts are misrepresented, a separate study looks at the related problem of so-called “social sycophancy.” In a pre-print paper published this month, researchers from Stanford and Carnegie Mellon University define these as situations “in which the model validates the user's own actions, perspectives, and self-image.”
That kind of subjective user confirmation can obviously be justified in some situations. So the researchers developed three separate sets of prompts, designed to measure different dimensions of social sycophancy.
First, more than 3,000 open-ended “advice questions” were collected from Reddit and advice columns. In this dataset, a control group of over 800 people approved of the advice seeker's actions only 39 percent of the time. However, across the eleven LLMs tested, the advice seeker's actions were endorsed a whopping 86 percent of the time, highlighting the machines' eagerness to please. Even the most critically tested model (Mistral-7B) achieved a 77 percent approval rate, nearly double that of the human baseline.