Skip to content

AI's hacking skills are approaching a 'tipping point'

    Vlad Ionescu and Ariel Herbert-Voss, co-founders of cybersecurity startup RunSybil, were briefly confused when their AI tool Sybil alerted them to a weakness in a customer's systems last November.

    Sybil uses a mix of different AI models – and a few technical tricks of its own – to scan computer systems for issues that hackers can exploit, such as an unpatched server or a misconfigured database.

    In this case, Sybil identified an issue with the customer's implementation of federated GraphQL, a language used to specify how data can be accessed over the Internet through application programming interfaces (APIs). The problem involved the customer accidentally disclosing confidential information.

    What confused Ionescu and Herbert-Voss was that pinpointing the problem required a remarkably deep knowledge of different systems and of the interaction between these systems. RunSybil says it has since found the same problem with other implementations of GraphQL, before anyone else made it public. “We searched the internet, but it didn't exist,” says Herbert-Voss. “Its discovery was a step in reasoning in terms of the capabilities of models – a step change.”

    The situation points to a growing risk. As AI models become smarter, their ability to find zero-day bugs and other vulnerabilities continues to increase. The same intelligence that can be used to detect vulnerabilities can also be used to exploit them.

    Dawn Song, a computer scientist at UC Berkeley who specializes in both AI and security, says recent advances in AI have produced models that are better at detecting errors. Simulated reasoning, which breaks down problems into component pieces, and agentic AI, such as searching the Internet or installing and running software tools, have increased the cyber capabilities of models.

    “The cybersecurity capabilities of border models have increased dramatically in recent months,” she says. “This is a turning point.”

    Last year, Song created a benchmark called CyberGym to determine how well large language models find vulnerabilities in large open-source software projects. CyberGym contains 1,507 known vulnerabilities found in 188 projects.

    In July 2025, Anthropic's Claude Sonnet 4 was able to find about 20 percent of the vulnerabilities in the benchmark. In October 2025, a new model, Claude Sonnet 4.5, was able to identify 30 percent. “AI agents can find zero-days, and at a very low cost,” Song says.

    Song says this trend shows the need for new countermeasures, including bringing in AI cybersecurity experts. “We need to think about how we can actually help AI more on the defense side, and we can explore different approaches,” she says.

    One idea is for pioneering AI companies to share models with security researchers before launch, so they can use the models to find bugs and secure systems before general release.

    Another countermeasure, Song says, is to rethink the way software is built in the first place. Her lab has shown that it is possible to use AI to generate code that is more secure than what most programmers use today. “In the long run, we think this secure-by-design approach will really help defenders,” says Song.

    The RunSybil team says that AI models' coding skills could give hackers the upper hand in the short term. “AI can generate actions on a computer and generate code, and those are two things that hackers do,” says Herbert-Voss. “As these capabilities accelerate, it means offensive security actions will also accelerate.”


    This is an edition of Will Knights AI Lab Newsletter. Read previous newsletters here.