Skip to content

To communicate with the real world, AI will acquire physical intelligence

    Recent AI models are surprisingly human-like in their ability to generate text, audio and video when requested. Until now, however, these algorithms have largely remained relegated to the digital world, rather than the physical, three-dimensional world we live in. Whenever we try to apply these models to the real world, even the most sophisticated struggles must be fought to perform adequately. – for example, consider how challenging it has been to develop safe and reliable self-driving cars. Although artificially intelligent, these models not only simply do not understand physics, but also often hallucinate, causing them to make inexplicable errors.

    However, this is the year when AI will finally make the leap from the digital world to the real world we live in. Expanding AI beyond its digital boundaries requires an overhaul of the way machines think, combining the digital intelligence of AI with the mechanical prowess of robotics. This is what I call 'physical intelligence', a new form of intelligent machine that can understand dynamic environments, deal with unpredictability and make decisions in real time. Unlike the models used by standard AI, physical intelligence is rooted in physics; in understanding the fundamental principles of the real world, such as cause-and-effect.

    Such features allow physical intelligence models to communicate and adapt to different environments. In my research group at MIT, we develop models of physical intelligence that we call fluid networks. For example, in one experiment we trained two drones – one controlled by a standard AI model and the other by a fluid network – to locate objects in a forest during the summer, using data collected by human pilots. Although both drones performed equally well when asked to do exactly what they were trained to do, only the fluid network drone successfully completed its task when asked to locate objects in different conditions (in winter or in an urban environment). This experiment showed us that, unlike traditional AI systems that stop evolving after their initial training phase, fluid networks continue to learn and adapt to experience, just like humans do.

    Physical intelligence is also capable of interpreting and physically executing complex commands derived from text or images, bridging the gap between digital instructions and real-world execution. For example, in my lab we have developed a physically intelligent system that can iteratively design and then 3D print small robots in less than a minute based on cues such as “robot that can walk forward” or “robot that can grab hold.” objects”.

    Other laboratories are also making important breakthroughs. For example, robotics startup Covariant, founded by UC-Berkeley researcher Pieter Abbeel, is developing chatbots – similar to ChatGTP – that can control robotic arms when requested. They have already raised more than $222 million to develop and deploy sorting robots in warehouses around the world. A team from Carnegie Mellon University also recently demonstrated that a robot with just one camera and imprecise controls can perform dynamic and complex parkour movements, including jumping on obstacles twice as high and over gaps twice as long are, using a single neural network trained via reinforcement. learn.

    If 2023 was the year of text-to-image and 2024 of text-to-video, then 2025 will mark the age of physical intelligence, with a new generation of devices – not just robots, but everything from electricity grids to smart homes – who can interpret what we tell them and perform tasks in the real world.