Google DeepMind Unveils Veo 2, a Next-Gen Video-Generating AI with Higher Resolution and Longer Clips

Jordan Vega

Jordan Vega

December 16, 2024 · 4 min read
Google DeepMind Unveils Veo 2, a Next-Gen Video-Generating AI with Higher Resolution and Longer Clips

Google DeepMind, the flagship AI research lab of Google, has unveiled Veo 2, a next-generation video-generating AI that surpasses OpenAI's Sora in terms of resolution and duration. Veo 2 can create videos up to 4K resolution (4096 x 2160 pixels) and 2 minutes long, significantly outperforming Sora's capabilities of 1080p and 20-second clips.

The new AI model is an upgrade to Veo, which currently powers a growing number of products across Google's portfolio. Veo 2 is initially available exclusively on VideoFX, Google's experimental video creation tool, where users can create videos capped at 720p and 8 seconds in length. However, Google plans to expand the number of users who can access VideoFX this week and eventually make Veo 2 available via its Vertex AI developer platform.

Eli Collins, VP of product at DeepMind, stated that the company will continue to iterate on Veo 2 based on user feedback and integrate its updated capabilities into compelling use cases across the Google ecosystem. Collins added that the company expects to share more updates next year.

Veo 2 boasts improved "understanding" of physics and camera controls, producing "clearer" footage with sharper textures and images, especially in scenes with a lot of movement. The model can also more realistically model motion, fluid dynamics, and properties of light, including different lenses and cinematic effects. Additionally, Veo 2 can generate videos in a range of styles, from realistic to Pixar-style animation.

Despite its impressive capabilities, Veo 2 still struggles with the "uncanny valley" phenomenon, where AI-generated videos can appear lifeless or unnatural. Collins acknowledged that coherence and consistency are areas for growth, and the company is working with artists and producers to refine its video generation models and tooling.

DeepMind trained Veo 2 on a large dataset of videos, although the exact source of the training data is unclear. The company maintains that training models using public data is fair use, but this approach has raised concerns among creatives who fear that AI models may infringe on their rights by training on content without consent.

To mitigate the risks associated with generative models, DeepMind is using prompt-level filters, including for violent, graphic, and explicit content. The company is also employing its proprietary watermarking technology, SynthID, to embed invisible markers into frames generated by Veo 2. However, like all watermarking tech, SynthID is not foolproof.

In addition to Veo 2, Google DeepMind announced upgrades to Imagen 3, its commercial image generation model. The new version of Imagen 3 can create "brighter, better-composed" images and photos in various styles, and is rolling out to users of ImageFX, Google's image-generating tool, beginning today.

The upgrades to Imagen 3 are accompanied by UI updates to ImageFX, which now include a feature that turns key terms in user prompts into "chiplets" with a drop-down menu of suggested, related words. This allows users to iterate on their prompts and select from a row of auto-generated descriptors beneath the prompt.

Overall, Veo 2 represents a significant advancement in video generation technology, and its potential applications across the Google ecosystem are vast. As the company continues to refine its models and tooling, it will be important to address the concerns around data ownership and consent that have arisen in the AI community.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.