Eli Collins, vice president of product management at Google DeepMind, first demonstrated generative AI video tools to the company’s board of directors in 2022. Despite the model’s slow speed, high operational costs, and sometimes skewed outcomes, he said it was eye-opening for them to see new video clips generated based on a random prompt.
Now, just a few years later, Google has announced plans for a tool in the YouTube app that will allow anyone to generate AI video clips using the company’s Veo model and post them directly as part of YouTube Shorts. “By 2025, we’re going to enable users to create standalone video clips and shorts,” said Sarah Ali, a senior director of product management at YouTube. “They’ll be able to generate six-second videos from an open-ended text prompt.” Ali said the update could help creators when they’re scrambling for footage to fill a video or trying to visualize something fantastical. She’s adamant that the Veo AI tool isn’t meant to replace creativity, but to amplify it.
This isn’t the first time Google has introduced generative tools to YouTube, though this announcement will mark the company’s most expansive AI video integration to date. Last summer, Google launched an experimental tool called Dream Screen for generating AI backgrounds for videos. Ahead of the full rollout of generated clips next year, Google will be updating that AI green screen tool with the Veo model sometime in the coming months.
The sprawling tech company has shown off multiple AI video models in recent years, like Imagen and Lumiere, but is trying to unify around a more unified vision with the Veo model. “Veo is going to be our model, by the way, going forward,” Collins says. “You shouldn’t expect to have five more models.” Yes, Google will likely eventually release another video model, but he expects to focus on Veo for the foreseeable future.
Google is facing competition from several startups developing their own generative text-to-video tools. OpenAI’s Sora is the highest-profile contender, but its AI video model, previously announced for 2024, isn’t yet publicly available and is reserved for a small number of testers. As for tools that are generally available, AI startup Runway has released multiple versions of its video software, including a recent tool for altering original videos into alternate-reality versions of the clip.
YouTube’s announcement comes as generative AI tools have become even more contentious for creators, who have sometimes seen the current wave of AI as a theft of their work and an attempt to undermine the creative process. Ali doesn’t see generative AI tools coming between creators and the authenticity of their relationship with viewers. “It’s really about the audience and what they’re interested in, not necessarily the tools,” she says. “But if your audience is interested in how you made it, that will be public through the description.” Google plans to watermark every AI video generated for YouTube Shorts with SynthID, which embeds an unnoticeable tag to identify the video as synthetic, and include a “made with AI” disclaimer in the description.
Hustle culture influencers are already trying to circumvent the algorithm by using multiple third-party tools to automate the creative process and monetize with minimal effort. Will next year’s Veo integration lead to another avalanche of low-quality, spammy YouTube Shorts dominating user feeds? “I think our experience with recommending the right content to the right viewer works in this AI world of scale, because we’ve done it at this massive scale,” Ali says. She also points out that YouTube’s standard guidelines still apply, regardless of which tool is used to create the video.
AI art often has a distinct aesthetic, which can be a concern for creators who value individuality and want their content to feel unique. Collins hopes that Google’s fingerprints won’t be all over the AI video output. “I don’t want people to look at this and say, ‘Oh, that’s the DeepMind model,’” he says. Getting the prompt to produce an AI output that matches what the creator intended is a key goal, and avoiding overt aesthetics is crucial for Veo to achieve broad customizability.
“A big part of the journey is actually building something that’s useful to people, scalable, and deployable,” Collins says. “It’s not just a demo. It’s being used in a real product.” He believes putting generative AI tools into the YouTube app will be transformative for creators, and for DeepMind as well. “We’ve never really built a product for creators before,” he says. “And we’ve certainly never done it at this scale.”