In a surprising turn of events, engineers at OpenAI have accidentally deleted data potentially relevant to a copyright lawsuit against the company. The lawsuit, filed by The New York Times and Daily News, alleges that OpenAI scraped their content without permission to train its AI models.
As part of the lawsuit, OpenAI had agreed to provide virtual machines for the publishers' counsel to search for copyrighted content in its AI training sets. However, on November 14, OpenAI engineers erased all the publishers' search data stored on one of the virtual machines. Although OpenAI was able to recover most of the data, the folder structure and file names were lost, making the recovered data unusable.
The incident has raised concerns over OpenAI's ability to search its own datasets for potentially infringing content and has sparked debate over the fair use of publicly available data in AI model training. OpenAI has maintained that training models using publicly available data is fair use, but the company has also inked licensing deals with several publishers, including the Associated Press and Financial Times. The terms of these deals remain undisclosed.
The case highlights the ongoing struggle to balance the need for AI model training data with the rights of content creators. As AI technology continues to evolve, the tech and startup communities will be watching this case closely for its implications on the future of AI development.