Skip to content

OpenAI reportedly nearing breakthrough with 'reasoning' AI, reveals progress framework

    Illustration of a robot with many arms.

    OpenAI recently unveiled a five-tiered system for measuring progress in artificial general intelligence (AGI), according to an OpenAI spokesperson who spoke to Bloomberg. The company shared the new ranking system with employees at a general meeting on Tuesday, with the aim of providing a clear framework for understanding the progress of AI. However, the system describes hypothetical technology that doesn’t yet exist and may be best interpreted as a marketing ploy to garner investment dollars.

    OpenAI has previously stated that AGI, a vague term for a hypothetical concept involving an AI system that can perform new tasks like a human without specialized training, is the company’s primary goal right now. The pursuit of technology that can replace humans in most intellectual tasks drives much of the company’s ongoing hype, even though such a technology would likely be massively disruptive to society.

    OpenAI CEO Sam Altman has previously stated that he believes AGI can be achieved within this decade, and much of the CEO’s public messaging has focused on how the company (and society at large) can navigate the disruption that AGI could bring. In that sense, a ranking system to communicate AI milestones achieved internally on the road to AGI makes sense.

    OpenAI’s five levels, which the company plans to share with investors, range from current AI capabilities to systems that could potentially manage entire organizations. The company believes its technology (such as the GPT-4o that powers ChatGPT) is currently at Level 1, which involves AI that can engage in conversational interactions. However, OpenAI executives have reportedly told staff that they’re on the cusp of reaching Level 2, dubbed “Reasoners.”

    Bloomberg lists OpenAI's five “stages of artificial intelligence” as follows:

    • Level 1: Chatbots, AI with conversational language
    • Level 2: Reasoners, human level problem solving
    • Level 3: Agents, systems that can take actions
    • Level 4: Innovators, AI that can help with inventions
    • Level 5: Organizations, AI that can do the work of an organization

    A Level 2 AI system would reportedly be able to perform basic problem solving on par with a human with a PhD but no access to external tools. During the all-hands meeting, OpenAI leadership reportedly demonstrated a research project using its GPT-4 model that researchers believe shows signs of approaching this human-like reasoning ability, according to a person familiar with the discussion who spoke to Bloomberg.

    The higher levels of OpenAI’s classification describe increasingly powerful hypothetical AI capabilities. Level 3 “agents” would be able to work autonomously on tasks for days. Level 4 systems would generate new innovations. The pinnacle, Level 5, envisions AI running entire organizations.

    This ranking system is still in development. OpenAI plans to gather feedback from employees, investors, and board members, and potentially refine the levels over time.

    Ars Technica asked OpenAI questions about the ranking system and the accuracy of the Bloomberg report, and a company spokesperson said they had “nothing to add.”

    The Problem with Ranking AI Capabilities

    OpenAI isn’t alone in trying to quantify levels of AI capability. As Bloomberg notes, OpenAI’s system feels similar to levels of autonomous driving charted by automakers. And in November 2023, researchers at Google DeepMind proposed their own five-tier framework for assessing AI progress , showing that other AI labs have also been trying to figure out how to rank things that don’t yet exist.

    OpenAI’s classification system also bears some resemblance to Anthropic’s “AI Safety Levels” (ASLs), first published by the creator of the Claude AI assistant in September 2023. Both systems aim to categorize AI capabilities, though they focus on different aspects. Anthropic’s ASLs are more explicitly focused on safety and catastrophic risk (such as ASL-2, which refers to “systems showing early signs of dangerous capabilities”), while OpenAI’s levels track general capabilities.

    However, any AI classification system raises questions about whether it’s possible to meaningfully quantify AI progress, and what constitutes progress (or even what constitutes a “dangerous” AI system, as in the case of Anthropic). The tech industry has a history of overpromising on AI capabilities, and linear progression models like OpenAI’s risk fueling potentially unrealistic expectations.

    There is currently no consensus in the AI ​​research community on how to measure progress toward AGI or even whether AGI is a well-defined or achievable goal. Therefore, OpenAI’s five-tier system should probably be seen as a communications tool to entice investors that demonstrates the company’s ambitious goals, rather than a scientific or even technical measure of progress.