OpenAI Bot Wreaks Havoc on Ecommerce Site, Highlights Loophole in Data Scraping

Max Carter

Max Carter

January 10, 2025 · 3 min read
OpenAI Bot Wreaks Havoc on Ecommerce Site, Highlights Loophole in Data Scraping

On Saturday, Triplegangers CEO Oleksandr Tomchuk received an alert that his company's ecommerce site was down due to a distributed denial-of-service (DDoS) attack. Further investigation revealed that the culprit was a bot from OpenAI, which was relentlessly attempting to scrape the entire site, comprising over 65,000 products, each with at least three photos.

The bot, identified as GPTBot, used 600 IP addresses to send tens of thousands of server requests, attempting to download hundreds of thousands of photos along with their detailed descriptions. Tomchuk described the incident as a "DDoS attack" that crushed his site, causing significant disruptions to his business.

Triplegangers' website is its primary business, offering a vast database of 3D image files scanned from actual human models. The company sells these files, as well as photos, to 3D artists, video game makers, and anyone requiring authentic human characteristics. The site's terms of service explicitly forbid bots from taking images without permission, but this did not deter OpenAI's bot.

The issue lies in the lack of a properly configured robot.txt file, which tells search engine sites what not to crawl as they index the web. OpenAI claims to honor such files when configured with its own set of do-not-crawl tags, but Tomchuk's experience suggests that this is not always the case. In the absence of a properly configured robot.txt file, AI companies like OpenAI assume they can scrape data at will, putting the onus on website owners to protect their content.

Tomchuk's team eventually set up a properly configured robot.txt file and a Cloudflare account to block OpenAI's bot, as well as other crawlers. However, he remains concerned about the lack of transparency and accountability in the data scraping process. With no way to contact OpenAI or determine what data was taken, Tomchuk is left to wonder about the extent of the breach.

This incident highlights a broader issue in the AI industry, where companies are creating massive databases by scraping data from websites without permission. The problem is exacerbated by the lack of a centralized opt-out tool, which would allow website owners to protect their content more easily. OpenAI has promised to deliver such a tool but has yet to do so.

The implications of this incident are far-reaching, particularly for small online businesses that may not have the resources to protect themselves from AI bots. Tomchuk warns that most sites remain unaware that they have been scraped by these bots, emphasizing the need for greater awareness and vigilance. The incident also underscores the importance of stricter regulations and accountability in the AI industry to prevent such abuses of power.

In conclusion, the incident involving OpenAI's bot and Triplegangers' ecommerce site serves as a wake-up call for the tech industry. It highlights the need for greater transparency, accountability, and regulation in the AI sector to prevent the exploitation of website owners and protect their intellectual property. As Tomchuk aptly puts it, "They should be asking permission, not just scraping data."

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.