How A Big Shift In Training LLMS Led To A Capacity Explosion

In a recent article, Benj Edwards of Ars Technica investigated some limitations of reasoning models that have been trained with learning strengthening. One study “, for example, revealed enigmatic inconsistencies in how models fail. Claude 3.7 Sonnet could perform a maximum of 100 correct movements in the Tower of Hanoi But after just five movements in a river crossing puzzle – despite the last to require fewer total movements. “

Conclusion: reinforcement education made agents possible

One of the most discussed applications for LLMS in 2023 was to make chatbots that understand the internal documents of a company. The conventional approach to this problem was called RAG – Short for the collection of augmented generation.

When the user asks a question, a raging system carries out a keyword or vector-based search to pick up the most relevant documents. These documents are then placed in the context window of an LLM before it generates an answer. RAG systems can cause compelling demos. But they usually do not work well in practice, because a single search will often not let the most relevant documents come.

Nowadays it is possible to develop much better information systems for information by having the model choose to choose searches. If the first search does not collect the correct documents, the model can revise the query and try it again. A model can perform five, 20 or even 100 searches before it gives an answer.

But this approach only works if a model is “agent” – if it can stay on task over several rounds of searching and analyzing. LLMS was terrible on this before 2024, as the examples of Autogt and Babyagi demonstrated. Today's models are much better in, so that modern raging style systems can produce better results with less scaffolding. You can regard “deep research” tools from OpenAI and others as very powerful RAG systems that have been made possible by Long-Context Reasoning.

The same point applies to the other agent applications that I mentioned at the start of the article, such as coding and computer user agents. What these systems have in common is a capacity for repeated reasoning. They think, take an action, think about the result, take a different action, and so on.

Timothy B. Lee was from 2017 to 2021 with staff at Ars Technica. Today he writes Understand ai, A newsletter that investigates how AI works and how it changes our world. You can subscribe here.

How a big shift in training LLMS led to a capacity explosion

Conclusion: reinforcement education made agents possible