Bo Li, an associate professor at the University of Chicago who specializes in stress testing and provoking AI models to detect misbehavior, has become a go-to source for some consulting firms, which are now often less concerned with how smart AI models are than with how problematic they might be (legally, ethically, and in terms of regulatory compliance).
Li and colleagues from several other universities, as well as Virtue AI, co-founded by Li, and Lapis Labs, recently developed a taxonomy of AI risks, along with a benchmark that shows how rule-breaking various large language models are. “We need some principles for AI safety, in terms of regulatory compliance and common usage,” Li tells WIRED.
The researchers analyzed government regulations and guidelines on AI, including those in the US, China and the EU. They also studied the usage policies of 16 major AI companies around the world.
The researchers also built AIR-Bench 2024, a benchmark that uses thousands of prompts to determine how popular AI models perform against specific risks. For example, it shows that Anthropic’s Claude 3 Opus scores high when it comes to refusing to generate cybersecurity threats, while Google’s Gemini 1.5 Pro scores high when it comes to avoiding generating non-consensual sexual nudity.
DBRX Instruct, a model developed by Databricks, scored worst across the board. When the company released its model in March, it said it would continue to improve DBRX Instruct’s security features.
Anthropic, Google and Databricks did not immediately respond to a request for comment.
Understanding the risk landscape, as well as the pros and cons of specific models, may become increasingly important for companies looking to deploy AI in particular markets or for particular use cases. For example, a company looking to use an LLM for customer service may care more about a model’s tendency to produce offensive language when provoked than how well it can design a nuclear weapon.
Bo says the analysis also reveals some interesting issues about how AI is developed and regulated. For example, the researchers found that government regulations are less comprehensive than corporate policies generally, suggesting there is room for stronger regulation.
The analysis also suggests that some companies could do more to ensure their models are secure. “If you test some models against a company’s own policies, they’re not necessarily compliant,” Bo says. “That means there’s a lot of room for improvement.”
Other researchers are trying to bring order to a messy and confusing AI risk landscape. This week, two MIT researchers unveiled their own database of AI hazards, compiled from 43 different AI risk frameworks. “A lot of organizations are still pretty early in that process of adopting AI,” which means they need guidance on the potential hazards, says Neil Thompson, an MIT researcher involved in the project.
Peter Slattery, a project leader and researcher in MIT’s FutureTech group, which studies advances in computing, says the database highlights how some AI risks are getting more attention than others. For example, more than 70 percent of frameworks cite privacy and security concerns, but only 40 percent cite misinformation.
Efforts to catalogue and measure AI risk will need to evolve as AI does. Li says it will be important to examine emerging issues, such as the emotional stickiness of AI models. Her firm recently analyzed the largest, most powerful version of Meta’s Llama 3.1 model. It found that while the model is more capable, it’s not much more secure, reflecting a broader divide. “The safety doesn’t really improve significantly,” Li says.