Anthropic, a US company that develops artificial intelligence models, revealed on Thursday that a group of hackers used one of its products, Claude Code, to carry out a wide-ranging cyberattack. According to Anthropic, the group is funded by the Chinese state and the attacks targeted around thirty large technology companies, financial institutions, chemical companies and government agencies: in at least four cases, the attacks were successful.
Although in the last two years there have been other cases where artificial intelligence was used to carry out cyberattacks, this appears to be the first case where the model managed to carry out most of the actions autonomously, with minimal contribution from humans. Claude Code (similar to the chatbot Claude, but specializing in programming) does more than 80 percent of the work a human hacker would typically do, and takes much less time. Anthropic said in a statement that the speed of development of artificial intelligence system programming capabilities has exceeded its expectations.
Anthropic explained that they became aware of the operation in September, that they had opened an internal investigation to understand how it happened, and that they had also notified the affected organizations (whose names were not disclosed). He said he decided to publicize the case to alert companies and governments and advise them to improve their cyber defense systems, because he believes these attacks will become more frequent and easier to carry out. He added that he has started working to prevent his models from having this kind of surgery as much as possible.
This attack was made possible thanks to three characteristics of artificial intelligence models developed over the past year: the ability to execute complex instructions and highly sophisticated tasks; the ability to act as an agent, i.e. to perform autonomous actions and chain tasks starting from a single initial human command; the ability to use other software available online to their advantage.
In its analysis, Anthropic divides the attack into three phases: in the first stage, human hackers select the targets they want to attack (for example the government agency to be compromised) and develop an attack plan based on the fact that Claude Code will carry out all the operations that take the most time. They then circumvented Claude’s security protocols, which were supposed to prevent the system from being used in this kind of operation: they did so by giving him a lot of small tasks without telling him what their purpose was, and by convincing Claude that he worked for a legitimate IT company that was conducting defense tests, which companies need to understand how protected they are from cyberattacks.
The second phase identified by Anthropic is the one in which Claude begins to work autonomously: the model examines the portals of the companies and institutions involved, identifies their vulnerabilities and then tries, through those vulnerabilities, to infiltrate and download sensitive data, and in some cases succeeds. According to Anthropic, the program identifies the accounts with the highest permissions on each platform and creates them back door (basically a system for secretly accessing a system) and then secretly stealing data.
This is the part of a cyber attack that usually takes a long time and is often carried out by many people at the same time. However, Claude was able to do it much faster and on his own, making “thousands of requests per second”.
In the final phase, Claude created, at the request of the hackers, documentation summarizing the attack, the stolen credentials that had been used and the goals achieved, also adding suggestions on how to proceed in the future.
