On Tuesday, Google Veo 3, a new AI video synthesa model that can do something that no large AI videoerator was able to do earlier: make a synchronized audio track. While from 2022 to 2024 we saw early steps in AI Video Generation, every video was quiet and usually very short of expensive. Now you can vote, dialogue and sound effects in eight-second high-definition video clips.
Shortly after the new launch, people started asking the most obvious benchmarking question: how good is VEO 3 at Faking Oscar-winning actor Will Smith when eating spaghetti?
First a short summary. The Spaghetti benchmark in AI-Video follows its origin until March 2023, when we first treated an early example of horrific AI-generated video using an open source video synthesa model called Modelscope. The Spaghetti example was later known enough that Smith parodied it almost a year later in February 2024.
This is what the original viral video looked like:
One thing that people forgot that the Smith example at the time was not the best AI-Videogerator who is there-a video synthesic model called Gen-2 from Runway had already achieved superior results (although it was not yet publicly accessible). But the result of the modelcope was funny and weird enough to stay in the memories of people as an early bad example of video synthesis, useful for future comparisons as AI models progressed.
AI app -developer Javi Lopez came to the rescue for the first time this week for curious Spaghetti fans with VEO 3, performing the Smith test and placing the results on X. But as you will notice below when you look, the soundtrack has a strange quality: the Faux Smith seems to crunch on the spaghetti.
On X Javi Lopez ran “Will Smith who eats spaghetti” in the VEO 3 AI video generator of Google and received this result.
It is a malfunction in the experimental capacity of VEO 3 to apply sound effects to video, probably because the training data used to create Google's AI models contain many examples of chewing mouths with cracking sound effects. Generative AI models are pattern-matching prediction machines and they must receive sufficient examples of different types of media to generate convincing new outputs. If a concept is over -represented or under -represented in the training data, you will see the results of unusual generation, such as Jabberwockies.