Google's AI research organization, DeepMind, has made a significant breakthrough in artificial intelligence with the unveiling of Genie 2, a revolutionary model capable of generating an "endless" variety of playable 3D worlds. This innovative technology has the potential to transform the gaming, research, and creative industries.
Genie 2, the successor to DeepMind's earlier Genie model, can create interactive, real-time scenes from a single image and text description. For instance, a user can input "A cute humanoid robot in the woods" and the model will generate a 3D world with the specified elements. This technology is similar to models being developed by Fei-Fei Li's company, World Labs, and Israeli venture Decart.
DeepMind claims that Genie 2 can generate a vast diversity of rich 3D worlds, including environments where users can take actions like jumping and swimming using a mouse or keyboard. The model's training data includes videos, enabling it to simulate object interactions, animations, lighting, physics, reflections, and the behavior of non-player characters (NPCs). The resulting simulations are remarkably realistic, with some resembling AAA video games.
However, the use of video game playthroughs in the model's training data raises questions about intellectual property implications. As a Google subsidiary, DeepMind has unfettered access to YouTube, and Google has previously implied that its terms of service grant permission to use YouTube videos for model training. Nevertheless, the legality of generating unauthorized copies of games through AI models like Genie 2 remains unclear.
Genie 2 can generate consistent worlds with different perspectives, such as first-person and isometric views, for up to a minute, with most simulations lasting between 10-20 seconds. The model responds intelligently to user actions, identifying the character and moving it correctly. For example, it can determine that arrow keys should move a robot and not trees or clouds.
Unlike other world models, Genie 2 can remember parts of a simulated scene that aren't in view and render them accurately when they become visible again. This capability sets it apart from other models, such as Decart's Minecraft simulator, Oasis, which has a low resolution and quickly "forgets" the layout of levels.
While games created with Genie 2 might not be enjoyable due to the model's limitations, such as erasing progress every minute, DeepMind is positioning the model as a research and creative tool. It can be used for prototyping interactive experiences and evaluating AI agents. The model's capabilities also enable the rapid creation of rich and diverse environments for AI agents, generating evaluation tasks that agents have not seen during training.
DeepMind believes that Genie 2 will be a key component in developing AI agents of the future. Google has been investing heavily in world models, which are expected to be a significant area of growth in AI research. The company recently hired Tim Brooks, who was leading development on OpenAI's Sora video generator, to work on video generation technologies and world models.
In conclusion, DeepMind's Genie 2 represents a significant breakthrough in AI research, with far-reaching implications for various industries. As the technology continues to evolve, it will be interesting to see how it is applied and the impact it has on the development of AI agents and interactive experiences.