Meta just announced its own media-focused AI model called Movie Gen, which can be used to generate realistic video and audio clips.
The company shared several 10-second clips generated with Movie Gen, including a Moo Deng-like baby hippo swimming around, to demonstrate its capabilities. While the tool isn't available for use yet, this Movie Gen announcement comes shortly after the Meta Connect event, where new and updated hardware and the latest version of its major language model, Llama 3.2, were showcased.
The Movie Gen model goes beyond generating simple text-to-video clips and can perform targeted edits to an existing clip, such as adding an object to someone's hands or changing the appearance of a surface. In one of Meta's sample videos, a woman wearing a VR headset was transformed to appear as if she were wearing steampunk binoculars.
Movie Gen allows audio bites to be generated alongside the videos. In the sample clips, an AI man stands near a waterfall with audible splashes and the hopeful sounds of a symphony; a sports car engine purrs and tires screech as it zooms around the track, and a snake slithers across the jungle floor, accompanied by thrilling horns.
Meta shared some more details about Movie Gen in a research paper released on Friday. Movie Gen Video consists of 30 billion parameters, while Movie Gen Audio consists of 13 billion parameters. (The number of parameters on a model roughly corresponds to how capable it is; the largest variant of Llama 3.1, by contrast, has 405 billion parameters.) Movie Gen can produce high-definition videos up to 16 seconds long, and Meta claims it performs better than competing models in overall video quality.
Earlier this year, CEO Mark Zuckerberg demonstrated Meta AI's Imagine Me feature, which lets users upload a photo of themselves and play their face in multiple scenarios by posting an AI image of themselves drowning in gold chains on Threads. A video version of a similar function is possible with the Movie Gen model. Think of it as a kind of ElfYourself on steroids.
What information is Movie Gen trained on? The details aren't clear in Meta's announcement post: “We trained these models on a combination of licensed and publicly available datasets.” The sources of training data and what can reasonably be obtained from the internet remain a contentious issue for generative AI tools, and it is rarely publicly known which text, video or audio clips were used to create any of the key models.
It will be interesting to see how long it takes for Meta Movie Gen to make it widely available. The announcement blog vaguely gestures to a “potential future release.” By comparison, OpenAI announced its AI video model, called Sora, earlier this year and has not yet made it available to the public or shared an upcoming release date (although WIRED did receive a few exclusive Sora clips from the company for an investigation to bias).
Given Meta's legacy as a social media company, it's possible that tools powered by Movie Gen will eventually emerge in Facebook, Instagram, and WhatsApp. In September, competitor Google shared plans to make aspects of its Veo video model available to creators in its YouTube Shorts sometime next year.
While larger tech companies are still holding off on fully releasing video models to the public, you can now experiment with AI video tools from smaller, emerging startups like Runway and Pika. Give Pikaffects a try if you've ever been curious what it would be like to see yourself cartoonishly crushed by a hydraulic press or suddenly melted into a puddle.