- Google’s video generation model got a major upgrade
- Announced at Google I/O, Veo 3 can combine audio and video in its output
- It’s an Ultra and US-only feature for now
AI video generation tools such as Sora and Pika can create alarmingly realistic bits of video, and with enough effort, you can tie those clips together to create a short film. One thing they can’t do, though, is simultaneously generate audio. Google’s new Veo 3 model can, and that could be a game changer.
Announced on Tuesday at Google I/O 2025, Veo 3 is the third generation of the powerful Gemini video generation model. With the right prompt, it can produce videos that include sound effects, background noises, and, yes, dialogue.
Google briefly demonstrated this capability for the video model. The clip was a CGI-grade animation of some animals talking in a forest. The sound and video were in perfect sync.
If the demo can be converted into real-world use, this represents a remarkable tipping point in the AI content generation space.
“We’re emerging from the silent era of video generation,” said Google DeepMind CEO Demis Hassabis in a press call.
Lights, camera, audio
He isn’t wrong. Thus far, no other AI video generation model can simultaneously deliver synchronized audio, or audio of any kind, to accompany video output.
It’s still not clear if Veo 3, which, like its predecessor, Veo 2, should be able to output 4K video, surpasses current video generation leader OpenAI Sora in the video quality department. Google has, in the past, claimed that Veo 2 is adept at producing realistic and consistent movement.
Regardless, outputting what appears to be fully produced video clips (video and audio) may instantly make Veo a more attractive platform.
It’s not just that Veo 3 can handle dialogue. In the world of film and TV, background noises and sound effects are often the work of Foley artists. Now, imagine if all you need to do is describe to Veo the sounds you want behind and attached to the action, and it outputs it all, including the video and dialogue. This is work that takes animators weeks or months to do.
In a release on the new model, Google suggests you tell the AI “a short story in your prompt, and the model gives you back a clip that brings it to life.”
If Veo 3 can follow prompts and output minutes or, ultimately, hours of consistent video and audio, it won’t be long before we’re viewing the first animated feature generated entirely through Veo.
Veo is live today and available in the US as part of the new Ultra tier ($249.99 a month) in the Gemini App and also as part of the new Flow tool.
Google also announced a few updates to its Veo 2 video generation model, including the ability to generate video based on reference objects you provide, camera controls, outpainting to convert from portrait to landscape, and object add and erase.
You might also like
https://cdn.mos.cms.futurecdn.net/iYy2bJpQUMa8DZiNqvhLc4.jpg
Source link
lance.ulanoff@futurenet.com (Lance Ulanoff)