Skip to content

The unveiling of Google's Genie 2 'world model' leaves more questions than answers

    As podcaster Ryan Zhao put it on Bluesky, “The design process has gone wrong when you have to prototype 'what if there was a space.'”

    I have to go quickly

    When Google unveiled the first version of Genie earlier this year, it also published a detailed research paper detailing the specific behind-the-scenes steps to train the model and how that model generated interactive videos. No such research paper has yet been published detailing the process of Genie 2, leaving us to guess at some key details.

    One of the most important of these details is the model speed. The first Genie model generated its world at about one frame per second, a rate orders of magnitude slower than would be reasonably playable in real time. For Genie 2, Google says only that “the samples in this blog post are generated by an undistilled base model, to show what's possible. We can play a distilled version in real time with a reduction in the quality of the output.”

    Reading between the lines, it sounds like the full version of Genie 2 does something well below the real-time interactions implied by those flashy GIFs. It's unclear how much 'quality reduction' is required to be able to check a toned-down version of the model in real-time, but given Google's lack of examples, we have to assume the reduction is significant.

    Oasis' AI-generated Minecraft clone shows great potential, but still has a lot of rough edges, so to speak.


    Credit: Oasis

    Real-time, interactive AI video generation isn't exactly a pipe dream. Earlier this year, AI model maker Decart and hardware maker Etched published the Oasis model, which showed a human-controllable, AI-generated video clone of Minecraft that runs at a full 20 frames per second. However, that 500 million-parameter model was trained on millions of hours of footage from a single, relatively simple game, and focused solely on the limited set of actions and environmental designs inherent to that game.

    When Oasis launched, its creators fully admitted that the model “struggles with domain generalization,” demonstrating how “realistic” opening scenes had to be reduced to simplistic Minecraft blocks to achieve good results. And even with those limitations, it's not hard to find footage of Oasis turning into a horrific nightmare after just a few minutes of play.