Artificial intelligence (AI) pioneers are abuzz about world models, also known as world simulators, which have the potential to revolutionize various fields, including video generation, forecasting, and robotics. World models take inspiration from the mental models of the world that humans develop naturally, allowing them to make predictions and understand the world around them.
Recently, AI pioneer Fei-Fei Li's World Labs raised $230 million to build "large world models," and DeepMind hired one of the creators of OpenAI's video generator, Sora, to work on "world simulators." This surge of interest in world models is largely due to their promising applications in the field of generative video. Currently, most AI-generated videos veer into uncanny valley territory, with bizarre and unrealistic scenarios. However, world models with a basic grasp of why objects behave in certain ways could significantly improve video generation.
A world model is trained on a range of data, including photos, audio, videos, and text, to create internal representations of how the world works and reason about the consequences of actions. This enables the model to understand why a basketball bounces in a certain way, for instance, rather than just predicting the outcome. According to Alex Mashrabov, Snap's ex-AI chief and CEO of Higgsfield, a strong world model can help create more realistic and immersive video experiences, allowing viewers to feel like they are part of the scene.
However, the potential applications of world models extend far beyond video generation. Researchers, including Meta chief AI scientist Yann LeCun, believe that world models could someday be used for sophisticated forecasting and planning in both the digital and physical realms. LeCun envisions a world model that can help achieve a desired goal through reasoning, such as cleaning a dirty room by deploying vacuums and cleaning the dishes. While LeCun estimates that we are at least a decade away from achieving this level of capability, today's world models are already showing promise as elementary physics simulators.
OpenAI's Sora, considered a world model, can simulate actions like a painter leaving brush strokes on a canvas and effectively simulate video games. Future world models may be able to generate 3D worlds on demand for gaming, virtual photography, and more. According to World Labs co-founder Justin Johnson, world models will enable the creation of fully simulated, vibrant, and interactive 3D worlds, revolutionizing industries such as gaming and virtual reality.
Despite the excitement surrounding world models, significant technical challenges remain. Training and running world models require massive compute power, and the models are prone to hallucinations and internalizing biases in their training data. A general lack of training data threatens to exacerbate these issues, making it essential to develop diverse and highly specific training datasets. Additionally, world models need to generate consistent maps of the environment and navigate and interact in those environments.
However, if these hurdles are overcome, world models could "more robustly" bridge AI with the real world, leading to breakthroughs in virtual world generation, robotics, and AI decision-making. They could also enable the development of more capable robots, giving them an awareness of the world around them and their own bodies. As Mashrabov notes, an advanced world model could allow an AI to develop a personal understanding of whatever scenario it's placed in and start to reason out possible solutions.
In conclusion, world models have the potential to revolutionize various fields, from video generation to robotics and forecasting. While significant technical challenges remain, the potential rewards are substantial, and researchers and industry leaders are eager to explore the possibilities of this emerging technology.