In a recent appearance on the "Possible" podcast, Google DeepMind CEO Demis Hassabis disclosed the company's intention to integrate its Gemini AI models with its Veo video-generating models. This move aims to significantly enhance the AI's comprehension of the physical world, marking a crucial step towards creating a universal digital assistant.
Hassabis emphasized that Gemini, the foundation model, was designed to be multimodal from its inception, with the ultimate goal of developing an assistant that can effectively aid users in the real world. By combining Gemini with Veo, Google seeks to leverage the strengths of both models to create a more sophisticated AI capable of understanding and interacting with the physical environment.
The AI industry is witnessing a shift towards the development of "omni" models, which can process and generate various forms of media, including images, text, audio, and video. Google's latest Gemini models have already demonstrated the ability to generate audio, images, and text, while OpenAI's ChatGPT default model can natively create images. Amazon has also announced plans to launch an "any-to-any" model later this year, further underscoring the trend towards multimodal AI capabilities.
The training of these omni models requires vast amounts of data, including images, videos, audio, and text. Hassabis hinted that the video data for Veo is primarily sourced from YouTube, a platform owned by Google. By leveraging YouTube's vast video repository, Veo 2 can learn the physics of the world, enabling it to better understand and interact with the physical environment.
Google had previously informed TechCrunch that its models "may be" trained on "some" YouTube content, in accordance with its agreement with YouTube creators. The company reportedly broadened its terms of service last year to allow for the use of more data to train its AI models. This move has sparked concerns among content creators, who may not be aware that their videos are being used to train AI models.
The implications of Google's plan to unify Gemini and Veo are far-reaching, with potential applications in various industries, including healthcare, education, and customer service. As AI models become increasingly sophisticated, they will be able to provide more accurate and personalized assistance, revolutionizing the way we interact with technology.
However, the development of omni models also raises important questions about data ownership, privacy, and accountability. As AI models become more pervasive, it is essential to establish clear guidelines and regulations governing the use of data and the development of AI capabilities. Google's move to combine Gemini and Veo serves as a reminder of the need for transparency and accountability in the AI industry.
As the AI landscape continues to evolve, Google's plan to unify Gemini and Veo is likely to have a significant impact on the industry. With its vast resources and expertise, Google is well-positioned to drive innovation in AI, but it must also prioritize transparency, accountability, and responsible AI development.