When LLMs autonomously attack

by Michael Cunningham

Carnegie Mellon researchers show how LLMs can be taught to autonomously plan and execute real-world cyberattacks against enterprise-grade network environments—and why this matters for future defenses.

In a groundbreaking development, a team of Carnegie Mellon University researchers has demonstrated that large language models (LLMs) are capable of autonomously planning and executing complex network attacks, shedding light on emerging capabilities of foundation models and their implications for cybersecurity research.

The project, led by Ph.D. candidate Brian Singer, a Ph.D. candidate in electrical and computer engineering (ECE), explores how LLMs—when equipped with structured abstractions and integrated into a hierarchical system of agents—can function not merely as passive tools, but as active, autonomous red team agents capable of coordinating and executing multi-step cyberattacks without detailed human instruction.

“Our research aimed to understand whether an LLM could perform the high-level planning required for real-world network exploitation, and we were surprised by how well it worked,” said Singer. “We found that by providing the model with an abstracted ‘mental model’ of network red teaming behavior and available actions, LLMs could effectively plan and initiate autonomous attacks through coordinated execution by sub-agents.”

Moving beyond simulated challenges

Prior work in this space had focused on how LLMs perform in simplified “capture-the-flag” (CTF) environments—puzzles commonly used in cybersecurity education.

Singer’s research advances this work by evaluating LLMs in realistic enterprise network environments and considering sophisticated, multi-stage attack plans.

Using state-of-the-art, reasoning-capable LLMs equipped with common knowledge of computer security tools failed miserably at the challenges. However, when these same LLMs and smaller LLMs as well were “taught” a mental model and abstraction of security attack orchestration, they showed dramatic improvement.

Rather than requiring the LLM to execute raw shell commands—often a limiting factor in prior studies—this system provides the LLM with higher-level decision-making capabilities while delegating low-level tasks to a combination of LLM and non-LLM agents.

Experimental evaluation: The Equifax case

To rigorously evaluate the system’s capabilities, the team recreated the network environment associated with the 2017 Equifax data breach—a massive security failure that exposed the personal data of nearly 150 million Americans—by incorporating the same vulnerabilities and topology documented in Congressional reports. Within this replicated environment, the LLM autonomously planned and executed the attack sequence, including exploiting vulnerabilities, installing malware, and exfiltrating data.

“The fact that the model was able to successfully replicate the Equifax breach scenario without human intervention in the planning loop was both surprising and instructive,” said Singer. “It demonstrates that, under certain conditions, these models can coordinate complex actions across a system architecture.”

Implications for security testing and autonomous defense

While the findings underscore potential risks associated with LLM misuse, Singer emphasized the constructive applications for organizations seeking to improve security posture.

“Right now, only big companies can afford to run professional tests on their networks via expensive human red teams, and they might only do that once or twice a year,” he explained. “In the future, AI could run those tests constantly, catching problems before real attackers do. That could level the playing field for smaller organizations.”

The research team features Singer, Keane Lucas of Anthropic and a CyLab alumnus, Lakshmi Adiga, an undergraduate ECE student, Meghna Jain, a master’s ECE student, Lujo Bauer of ECE and the CMU Software and Societal Systems Department (S3D), and Vyas Sekar of ECE. Bauer and Sekar are co-directors of the CyLab Future Enterprise Security Initiative, which supported the students involved in this research.

The team is now pursuing follow-up work focused on autonomous defenses, exploring how LLM-based agents might be used to detect, contain, or counteract automated attacks. Early experiments involve simulated AI-versus-AI scenarios designed to study the dynamics between offensive and defensive LLM agents.

The research has attracted interest from both industry and academic audiences. An early version of the work was presented at a security-focused workshop hosted by OpenAI on May 1.

The project was conducted in collaboration with Anthropic, which provided model use credits and consultation but did not directly fund the study. The resulting paper has been cited in several industry security reports and is already being used in model system cards by frontier model vendors.

A research prototype—not a general threat

Singer cautioned that the system is still a research prototype and not yet ready for widespread deployment in uncontrolled environments.

“It only works under specific conditions, and we do not have something that could just autonomously attack the internet,” he said. “But it’s a critical first step.”

As LLM capabilities continue to evolve, the team’s work underscores the importance of rigorous, proactive research into how these systems behave in complex, real-world scenarios—and how that knowledge can inform both policy and practice in cybersecurity.