“Jailbreaks continue to exist, simply because it is almost impossible to eliminate it almost impossible – just like buffer crossing vulnerabilities in software (which have been around for more than 40 years) or SQL injection errors in web applications (which have been harassing security teams for more than two decades),” Polyakov , the CEO of security company Adversa Ai, Wired said in an e -mail.
Sampath from Cisco argues that as companies use more types of AI in their applications, the risks are strengthened. “It is starting to become a major problem when you start placing these models in important complex systems and those jailbreaks suddenly result in electricity things that increase liability, increases business risk, increases business problems for companies,” Sampath says.
The Cisco researchers drew their 50 randomly selected instructions to test Deepseek's R1 from a well-known library with standardized evaluation prersets known as Harmbench. They tested instructions from six Harmbench categories, including general damage, cyber crime, wrong information and illegal activities. They investigated the model locally on machines instead of sending that data to China via Deepseek's website or app.
Furthermore, the researchers say that they have also seen a number of possible results of testing R1 with more involved, non-linguistic attacks using things such as Cyrillic characters and tailor-made scripts to try to achieve code version. But for their first tests, says Sampath, his team wanted to concentrate on findings that resulted from a generally recognized benchmark.
Cisco also included comparisons of the performance of R1 against Harmbench prompts with the performance of other models. And some, such as Meta Lama 3.1, almost as serious as Deepseek's R1. But Sampath emphasizes that the R1 van Deepseek is a specific reasoning model, which takes longer to generate answers, but that holds more complex processes to try to produce better results. That is why Sampath argues, the best comparison with the O1 -Redeneer model of OpenAi, which did the best of all tested models. (Meta did not immediately respond to a request for comment).
Polyakov, from Adversa Ai, explains that Deepseek seems to detect and reject some well-known Jailbreak attacks and to say that “it seems that these reactions are often just copied from the OpenAI data set.” However, Polyakov says that in the tests of his company of four different types of Jailbreaks from linguist tricks, the limitations of Deepseek can easily be circumvented.
“Every method worked flawlessly,” says Polyakov. “What is still alarming is that this is not a new 'zero-day' jailbreaks have been known for years,” he says, claiming that he saw the model go more depth with some instructions on psychedelics than he had seen other Make a model.
“Deepseek is just another example of how each model can be broken – it's just a matter of how much effort you make. Some attacks can be patched, but the attack surface is infinite,” Polyakov adds. “If you are not continuously red-teams, you are already compromised.”