GitHub Data Exposed: Microsoft Copilot AI Chatbot Retains Private Repositories

Elliot Kim

Elliot Kim

February 26, 2025 · 4 min read
GitHub Data Exposed: Microsoft Copilot AI Chatbot Retains Private Repositories

Security researchers have sounded the alarm over a concerning vulnerability in Microsoft's Copilot AI chatbot, which retains access to private GitHub repositories even after they've been made private. This means that sensitive data from top companies like Amazon, Google, and Microsoft itself can be exposed to anyone using the chatbot.

The discovery was made by Lasso, an Israeli cybersecurity company focused on emerging generative AI threats. According to Lasso's findings, thousands of once-public GitHub repositories from some of the world's biggest companies are affected, including Microsoft's own repositories. The company found that content from its own GitHub repository, which had been mistakenly made public for a brief period, was still accessible through Copilot even after it was set to private.

Lasso co-founder Ophir Dror explained that the repository, which had been indexed and cached by Microsoft's Bing search engine, could be accessed through Copilot by asking the right question. "If I was to browse the web, I wouldn't see this data. But anyone in the world could ask Copilot the right question and get this data," Dror said.

Lasso's investigation revealed that any data on GitHub, even if it was only public for a brief moment, could be potentially exposed by tools like Copilot. The company extracted a list of repositories that were public at any point in 2024 and identified the repositories that had since been deleted or set to private. Using Bing's caching mechanism, Lasso found more than 20,000 since-private GitHub repositories still had data accessible through Copilot, affecting more than 16,000 organizations.

Affected organizations include Amazon Web Services, Google, IBM, PayPal, Tencent, and Microsoft itself. For some affected companies, Copilot could be prompted to return confidential GitHub archives that contain intellectual property, sensitive corporate data, access keys, and tokens. Lasso noted that it used Copilot to retrieve the contents of a GitHub repo – since deleted by Microsoft – that hosted a tool allowing the creation of "offensive and harmful" AI images using Microsoft's cloud AI service.

Lasso has reached out to all affected companies, advising them to rotate or revoke any compromised keys. However, none of the affected companies named by Lasso responded to questions, and Microsoft also failed to respond to inquiries.

Lasso informed Microsoft of its findings in November 2024, but the tech giant classified the issue as "low severity," stating that this caching behavior was "acceptable." Microsoft no longer includes links to Bing's cache in its search results starting December 2024, but Lasso says that though the caching feature was disabled, Copilot still had access to the data even though it was not visible through traditional web searches, indicating a temporary fix.

The implications of this vulnerability are far-reaching, and it raises concerns about the security of AI chatbots and their ability to retain sensitive data. As the use of AI chatbots becomes more widespread, it's essential that companies like Microsoft take steps to ensure that their tools are not inadvertently exposing sensitive information.

For now, the onus is on affected companies to take action to protect their data. As Dror warned, "Anyone in the world could ask Copilot the right question and get this data." It's a sobering reminder of the importance of data privacy and security in the age of AI.

Similiar Posts

Copyright © 2024 Starfolk. All rights reserved.