News that Anthropic's AI chatbot, Claude, was key to a 150 GB hack of the Mexican government broke just as the US reconsiders a contract with the company.
On February 25, 2026, news broke of a massive data breach from the Mexican government, powered by Anthropic artificial intelligence (AI) chatbot, Claude. A single hacker was able to steal 150 GB of private data from several government agencies, including voter records, 195 million taxpayer records, civil registry files, and even employee credentials, simply by manipulating the chatbot.
Bloomberg reported that the hacker wrote in Spanish while instructing Claude to “act as an elite hacker, finding vulnerabilities in government networks, writing computer scripts to exploit them and determining ways to automate data theft,” as part of a fictitious bug bounty program (an initiative that rewards security researchers for finding and reporting vulnerabilities in key systems and software). The chatbot reportedly denied the original prompts as they violated the AI safety guidelines, but completed the commands after persistent requests. In addition to Claude, the hacker used ChatGPT, taking advantage of each chatbot when the other’s progress was stopped by guardrails intended to prevent this style of attack.
This AI “jailbreaking” essentially allowed the hacker to hack into the systems without having to do any actual work, writing code, or testing systems. Curtis Simpson, the chief strategy officer at Gambit Security (the company that researched the breach), explained that after being prodded by the attacker, Claude “produced thousands of detailed reports that included ready-to-execute plans, telling the human operator exactly which internal targets to attack next and what credentials to use”. These reports helped automate several cyberattacks against the Mexican government between December 2025 and January 2026.
Gambit researchers found at least 20 specific exploited vulnerabilities in their review of the attack, and that the Mexican national government, alongside the state governments of Jalisco, Tamaulias, Mexico, and Michoacán, were among the compromised institutions. Some federal agencies, like Mexico’s electoral institute, refuted that there were any breaches or unauthorized access in recent months, and Jalisco state similarly refuted any breaches. In late December 2025 there was a press release about a security incident around personal data from the Mexican government, but it’s not clear if this was related to the Claude-powered attacks.
Cyberattacks being enabled by AI is not a new concern. It’s intuitive that a tool that makes many programming-related tasks easier also lowers the skill threshold required to organize complex cyberattacks. Additionally, AI allows for the automation of several simultaneous attacks, a feat that would be otherwise difficult for any singular threat actor. Typically, AI systems have some guardrails against malicious use, but as seen in the Claude attack, they can be overridden with careful prompting. In 2025, Claude had been used in a separate set of cyberattacks, this time by state-sponsored hackers in China, in a scheme that Anthropic called “the first documented case of a large-scale cyberattack executed without substantial human intervention”.
In response to the attack on Mexico, OpenAI explained that they did find the hacker’s attempts to override the usage policies, but these attempts were unsuccessful. Anthropic tracked down the hack related activity and banned the accounts running the prompts, before teaching their latest AI model, Claude Opus 4.6, about this malicious activity and setting in place probes to prevent further abuse.
Curiously enough, Anthropic edited their Responsible Scaling Policy (RSP) this week to remove their promise not to release AI models without a guarantee of proper risk mitigations beforehand. In an interview with TIME, Jared Kaplan, the chief science officer of Anthropic, discussed why: “We didn’t really feel, with the rapid advance of AI, that it made sense for us to make unilateral commitments … if competitors are blazing ahead.” In the newest version of the RSP, Anthropic maintains that it will “delay” AI development if the risks of catastrophe are determined to be significant, but the change to the RSP is still a departure from the safety-conscious approach the company previously touted.
Anthropic’s pivot to prioritizing AI advancement is likely also tied to the United States AI market and federal cybersecurity. In the US, there is no federal-level regulation on AI “to advance US AI leadership.” States have been discouraged from creating “onerous” AI regulations, despite several bipartisan efforts to regulate it. This week, Anthropic AI made headlines again for a conflict with Pete Hegseth, the US Secretary of War. Like Google, OpenAI, and xAI, Anthropic received a $200 million contract to work with the Pentagon last summer. Anthropic was named the most secure and was the first to be cleared for use, leading to Claude AI being used in the raid on Venezuela that captured former president Nicolás Maduro. Now, in a “very aggressive” approach, Hegseth is threatening to blacklist Anthropic if they do not roll back some of their safety standards by February 27, 2026 - standards that would prevent AI-directed warfare and domestic mass surveillance. While it doesn’t appear that Anthropic will concede, the RSP change may be a slight concession as a result of the pressure.
The Anthropic data breach will likely have serious consequences for both the US and Mexico. Mexico identified AI as both a threat risk to and a possible tool for national cyberdefense in their December 2025 National Cybersecurity Plan for 2025 to 2030. While AI tools are not planned for federal systems integration until 2027 in Mexico, the use of Claude AI to infiltrate critical databases may stir up more safety concerns. In the US, the data breach is shedding light on the risks of AI, just as certain parts of the federal government push for even less AI regulation. In both cases, any time weighing safe use of the tool will likely be well-spent.