Skip to content

Researchers propose a better way to report dangerous AI errors

    At the end of 2023, a team of researchers from third parties discovered a disturbing malfunction in the commonly used artificial intelligence model of OpenAI GPT-3.5.

    When asked to repeat certain words a thousand times, the model began to repeat the word time and time again and then suddenly switched to the spit of incoherent text and fragments of personal information from the training data, including parts of names, telephone numbers and e -mail addresses. The team that discovered the problem worked with OpenAi to ensure that the error was resolved before it was publicly unveiled. It is just one of the many problems found in large AI models in recent years.

    In a proposal released today, more than 30 prominent AI researchers, including some who have found the GPT-3.5 error, say that many other vulnerabilities that influence popular models are reported in problematic ways. They suggest a new schedule that is supported by AI companies that give outsiders permission to investigate their models and a way to make mistakes publicly known.

    “At the moment it is a bit of the Wild West,” says Shayne Longpre, a PhD student at MIT and the main author of the proposal. LongPre says that some so -called jailbreakers share their methods to break AI guaranteeing the social media platform X, which means that models and users are at risk. Other Jailbreaks are shared with only one company, although they can have a lot of influence. And some mistakes, he says, are kept secret because of fear of being banished or being confronted with prosecution for breaking the conditions of use. “It is clear that there are horrifying effects and uncertainty,” he says.

    The security and security of AI models is extremely important, since the technology is now being used, and how it can seep into countless applications and services. Powerful models must be tested, or red-team, because they can accommodate harmful prejudices, and because certain input can lead to them being separate from guardrails and producing unpleasant or dangerous reactions. These include encouraging vulnerable users to keep harmful behavior or to help a bad actor to develop cyber, chemical or biological weapons. Some experts fear that models can help cyber criminals or terrorists, and can even encourage people as they improve.

    The authors suggest three main measures to improve the disclosure process of third parties: use standardized AI-Fout reports to streamline the reporting process; For large AI companies to offer infrastructure to researchers from third parties who reveal errors; And for the development of a system with which errors can be shared between different providers.

    The approach is derived from the world of cyber security, where there are legal protection and established standards for external researchers to make bugs known.

    “AI researchers do not always know how to make a mistake public and cannot be sure that their disclosure of good faith will not expose legal risk,” says Ilona Cohen, Chief Legal and Policy Officer at Hackerone, a company that organizes bolts and a co -author in the report.

    Large AI companies are currently conducting extensive safety tests on AI models prior to their release. Some also contract with external companies to investigate further. There are enough people in those [companies] To tackle all problems with AI systems for general purposes, used by hundreds of millions of people in applications that we have never dreamed? “Asks LongPre. Some AI companies have started organizing AI BUG principles. However, LongPre says that independent researchers run the risk of breaking through the conditions of use if they take on the investigation of powerful AI models.