According to tech pundits and professional skeptics fixated on the market, the artificial intelligence bubble has burst and winter is back. Fei-Fei Li doesn’t think so. In fact, Li, who has been nicknamed the “godmother of AI,” is betting on the opposite. She’s on part-time leave from Stanford University to start a company called World Labs. While current generative AI is language-based, she sees a frontier where systems construct entire worlds with the physics, logic and rich detail of our physical reality. It’s an ambitious goal, and despite the dour nabobs who say that progress in AI has reached a grim plateau, World Labs is on the fast track to funding. The startup may be a year away from a product — and it’s far from clear how well it will perform when or if it does — but investors have poured in $230 million, reportedly valuing the nascent startup at $1 billion.
About a decade ago, Li helped AI turn a corner by creating ImageNet, a custom database of digital images that made neural networks dramatically smarter. She argues that today’s deep-learning models need a similar boost if AI is to create real worlds, whether they’re realistic simulations or entirely fictional universes. Future George R. R. Martins could compose their dreamed-up worlds as prompts instead of prose, then render and wander around in them. “The physical world for computers is seen through cameras, and the computer brain is behind the cameras,” Li says. “To translate that vision into reasoning, generation, and ultimately interaction requires understanding the physical structure, the physical dynamics of the physical world. And that technology is called spatial intelligence.” World Labs calls itself a spatial intelligence company, and its fate will help determine whether that term becomes a revolution or a punchline.
Li has been obsessed with spatial intelligence for years. While everyone else was raving about ChatGPT, she and a former student, Justin Johnson, would chat excitedly on phone calls about the next iteration of AI. “The next decade is going to be about generating new content that takes computer vision, deep learning, and AI out of the internet world and puts them in space and time,” says Johnson, now an assistant professor at the University of Michigan.
Li decided to start a company in early 2023 after a dinner with Martin Casado, a virtual networking pioneer who is now a partner at Andreessen Horowitz, the VC firm notorious for its near-messianic embrace of AI. Casado sees AI as following a similar path to computer games, which started with text, moved to 2D graphics, and now have dazzling 3D visuals. Spatial intelligence will drive the change. Eventually, he said, “you’ll be able to take your favorite book, throw it into a model, and then literally step inside it and watch it play out in real time, in an immersive way,” he said. The first step to making that happen, Casado and Li agreed, is moving from big language models to big world models.
Li began assembling a team, with Johnson as a co-founder. Casado suggested two more people: One was Christoph Lassner, who had worked at Amazon, Meta’s Reality Labs and Epic Games. He’s the inventor of Pulsar, a rendering scheme that led to a celebrated technique called 3D Gaussian Splatting. It sounds like an indie band at an MIT toga party, but it’s actually a way to synthesize scenes, rather than one-off objects. Casado’s other suggestion was Ben Mildenhall, who had developed a powerful technique called NeRF (neural radiance fields), which turns 2D pixel images into 3D graphics. “We were taking real-world objects and making them look completely real,” he says. He left his role as a senior research scientist at Google to join Li’s team.
An obvious goal of a large world model would be to give robots, well, a sense of the world. That’s in World Labs’ plan, but not for a while. The first phase is to build a model with a deep understanding of three-dimensionality, physicality, and concepts of space and time. Then comes a phase where the models support augmented reality. After that, the company can tackle robotics. If this vision is fulfilled, large world models will enhance autonomous cars, automated factories, and perhaps even humanoid robots.