The AISI must protect the US from risky AI models by conducting security tests to detect damage before models are deployed. Tests must “address risks to human rights, civil rights, and civil liberties, such as those related to privacy, discrimination and bias, freedom of expression, and the safety of individuals and groups,” President Joe Biden said in a national security memo last year . month, highlighting that safety testing is critical to supporting unparalleled AI innovation.
“For the United States to maximize the benefits of AI, Americans need to know when they can trust systems to perform safely and reliably,” Biden said.
But the AISI's security testing is voluntary, and while companies like OpenAI and Anthropic have agreed to the voluntary testing, not every company has. Hansen is concerned that AISI is under-resourced and under-budget to achieve its broad goals of protecting America from untold harm from AI.
“The AI Safety Institute predicted that they're going to need about $50 million in funding, and that was before the National Security memo, and it doesn't look like they're going to get that at all,” Hansen told Ars.
Biden had budgeted $50 million for AISI by 2025, but Donald Trump has threatened to dismantle Biden's AI security plan upon taking office.
The AISI would likely never be funded well enough to detect and deter all AI harm, but because its future is unclear, even the limited safety testing the US had planned could be halted at a time when the AI industry continues to move full steam ahead.
That would leave the public largely at the mercy of AI companies' internal security testing. With groundbreaking models from major companies likely to remain under society's microscope, OpenAI has pledged to increase investment in security testing and help set industry-leading security standards.
According to OpenAI, this effort includes making models more secure over time and less prone to producing malicious results, even with jailbreaks. But OpenAI still has a lot of work to do in that area, as Hansen told Ars that he has a “standard jailbreak” for OpenAI's most popular release, ChatGPT, “that almost always works” and produces malicious results.