OpenAI's latest AI model, Sora, has been making waves in the tech community with its impressive video generation capabilities. However, a closer look at the model's training data has raised concerns about copyright infringement, particularly with regards to video game playthroughs. It appears that Sora has been trained on a vast amount of video game footage, including popular titles like Minecraft, Super Mario Bros., and Call of Duty.
The model's ability to generate gameplay footage that closely resembles the original games has sparked concerns among game developers and industry experts. Joshua Weigensberg, an IP attorney at Pryor Cashman, notes that "companies that are training on unlicensed footage from video game playthroughs are running many risks." He explains that training a generative AI model involves copying the training data, which in this case includes copyrighted materials.
OpenAI has been tight-lipped about the exact sources of Sora's training data, but it's clear that the model has been trained on a vast amount of publicly available data, including YouTube, Instagram, and Facebook content. The company has also acknowledged using licensed data from stock media libraries like Shutterstock. However, the lack of transparency has raised concerns about the potential infringement of game developers' intellectual property rights.
The implications of Sora's training data go beyond just copyright infringement. The model's ability to generate gameplay footage that closely resembles the original games raises questions about fair use and transformative works. Evan Everist, an attorney at Dorsey & Whitney specializing in copyright law, notes that "videos of playthroughs involve at least two layers of copyright protection: the contents of the game as owned by the game developer, and the unique video created by the player or videographer capturing the player's experience."
The risks associated with Sora's training data are not limited to copyright infringement. The model's output could also violate trademark rights, and potentially create risks for name, image, and likeness rights. Furthermore, the growing interest in world models, which generate video games in real-time, could further complicate the legal landscape.
Industry experts are calling for greater transparency and accountability from AI companies like OpenAI. "Unless these works have been properly licensed, training on them may infringe," Weigensberg notes. The lack of clear guidelines and regulations surrounding AI-generated content has created a legal gray area, and it's unclear how courts will rule on these issues in the future.
In the meantime, game developers and publishers are taking a cautious approach. Few were willing to comment on the record, but CD Projekt Red, the developer of Cyberpunk 2077, stated that they "won't be able to get involved in an interview at the moment." EA told TechCrunch that they "didn't have any comment at this time."
The controversy surrounding Sora highlights the need for greater transparency and accountability in the development of AI models. As AI-generated content becomes increasingly sophisticated, it's essential that developers and policymakers address the legal and ethical implications of these technologies.