Researchers Claim Breakthrough In The Fight Against The Frustrating Safety Hole Of AI

The P-LLM never sees the content of e-mails or documents. It only sees that there is a value, such as “e -mail = get_last_email ()”, and then writes code that works on it. This separation ensures that malignant text cannot influence what actions the AI decides to take.

Camel's innovation extends beyond the double LLM approach. Camel converts the prompt from the user in a series of steps described using code. Google DeepMind chose to use a closed Subset of Python because every available LLM is already skilled in writing Python.

From prompt to secure version

Willison, for example, promptly gives the example “Find bob's e -mail in my last e -mail and send him a reminder about the meeting of tomorrow”, which would convert itself as follows in code:

email = get_last_email()
address = query_quarantined_llm(
"Find Bob's email address in [email]",
output_schema=EmailStr
)
send_email(
subject="Meeting tomorrow",
body="Remember our meeting tomorrow",
recipient=address,
)

In this example, e -mail is a potential source of non -vested tokens, which means that the e -mail address can also be part of a fast injection attack.

By using a special, secure interpreter to perform this Python code, Camel can follow this closely. While the code is being performed, the interpreter where each piece of data comes from, that a “data trail” is called. For example, it notes that the address variable was made with the help of information from the possibly non -confident E -mail variable. The security policy then applies based on this data path. This process includes Camel that analyzes and systematically performs the structure of the generated Python code (using the AST library).

The most important insight here is the treatment of rapid injection such as following potentially polluted water through pipes. Camel looks at how data flows through the steps of the Python code. When the code tries to use a piece of data (such as the address) in an action (such as “Send_Email ()”), the Camel -Tolk checks its data path. If the address comes from a non -confidential source (such as the E -mail content), the security policy can block the “Send_Email” action or ask the user for explicit confirmation.

Researchers claim breakthrough in the fight against the frustrating safety hole of AI

From prompt to secure version