OpenAI Accused of Deleting Evidence in Copyright Infringement Case

In a dramatic turn of events, lawyers for The New York Times and Daily News have accused OpenAI engineers of deleting data potentially relevant to their copyright infringement lawsuit against the AI startup. The deleted data was stored on a virtual machine provided by OpenAI for the plaintiffs to search for their copyrighted content in OpenAI's AI training sets.

The incident occurred on November 14, when OpenAI engineers erased all the publishers' search data stored on one of the virtual machines. Although OpenAI attempted to recover the data, the folder structure and file names were "irretrievably" lost, rendering the recovered data unusable for determining where the news plaintiffs' copied articles were used to build OpenAI's models.

The plaintiffs' counsel has stated that they have no reason to believe the deletion was intentional, but the incident has raised concerns about OpenAI's ability to search its own datasets for potentially infringing content using its own tools. The lawyers have been forced to recreate their work from scratch, investing significant person-hours and computer processing time.

In response to the allegations, OpenAI's counsel has denied that the company deleted any evidence, instead suggesting that the plaintiffs were to blame for a system misconfiguration that led to a technical issue. According to OpenAI's attorneys, the plaintiffs requested a configuration change to one of the virtual machines, which resulted in removing the folder structure and some file names on one hard drive.

The underlying issue in the lawsuit is OpenAI's practice of training its AI models using publicly available data, including articles from The New York Times and Daily News, without obtaining permission or paying royalties. OpenAI maintains that this is fair use, but the publishers argue that the company's use of their copyrighted content without permission constitutes infringement.

Notably, OpenAI has entered into licensing agreements with several publishers, including the Associated Press, Business Insider owner Axel Springer, Financial Times, People parent company Dotdash Meredith, and News Corp. While the terms of these deals remain undisclosed, one content partner, Dotdash, is reportedly receiving at least $16 million per year.

The outcome of this case could have significant implications for the AI industry, which relies heavily on large datasets to train its models. If OpenAI is found to have infringed on the publishers' copyrights, it could set a precedent for future lawsuits and force AI companies to reexamine their data sourcing practices.

As the legal battle unfolds, it remains to be seen how the court will rule on the issue of fair use and the responsibility of AI companies to ensure they are not infringing on copyrighted content. One thing is certain, however: the stakes are high, and the outcome will have far-reaching consequences for the tech industry.

OpenAI Accused of Deleting Evidence in Copyright Infringement Case

Similiar Posts

Apple Wins Patent Case Against Masimo, But No Windfall

Eli Health Unveils Hormometer, an At-Home Hormone Testing System Using Saliva and Smartphone Camera

OpenAI's GPT-4.5 AI Model Raises Concerns with Its Persuasive Capabilities