Safety in Self-Evolving LLM Agent Systems: Threats, Amplification, and Case Studies
2026-06-22 • Cryptography and Security
Cryptography and SecurityArtificial Intelligence
AI summaryⓘ
The authors studied large language model (LLM) systems that can update themselves automatically, finding that this self-evolution creates unique security risks. They broke down these risks into a grid of different system parts and phases, discovering many serious threats with no current fixes. They also found that problems can grow and combine in ways that can't be stopped by fixing just one part of the system. Testing real examples showed that self-evolving designs are much easier to attack and attacks tend to last indefinitely. The authors suggest new security methods that are aware of the evolving nature of these systems are needed.
LLM agent systemsself-evolutionattack surfacesecurity analysismodule-lifecycle matrixadversarial threatspersistent attacksformal verificationself-modifying systemscybersecurity
Authors
Ruixiao Lin, Xinhao Deng, Qingming Li, Jianan Ma, Yunhao Feng, Yuqi Qing, Zhenyuan Li, Yechao Zhang, Shiwen Cui, Changhua Meng, Tianwei Zhang, Xingjun Ma, Qi Li, Ke Xu, Shouling Ji
Abstract
Self-evolving LLM agent systems, which autonomously update their model parameters, memory, tools, and architectures, introduce a qualitatively new threat landscape in which adversarial influences become permanently encoded, self-amplify across generations, and propagate through populations without sustained attacker access. We present a systematic security and privacy analysis organized around the Module-Lifecycle Attack Surface (MLAS) matrix, which decomposes the attack surface into five functional modules (Brain, Cognitive Resource, Execution, Self-Design, Collective) $\times$ five lifecycle stages (Bootstrap, Propose, Evaluate, Commit, Serve). Analysis of the resulting 25 cells reveals that 17 face critical threats for which no effective partial mitigation. We identify seven cross-cutting amplification effects that interact synergistically and cannot be addressed by securing individual modules in isolation. Comparative case studies of two open-source frameworks demonstrate that evolution-native design activates $3.5\times$ more attack surface cells and achieves a 100% attack persistence rate (40/40 payloads across all CIA+Privacy categories), while co-located security scanners block only 2.5% of attacks. Our findings establish that self-evolution converts every known attack category from session-bounded to lineage-persistent, gives rise to entirely new attack classes, and renders static defenses structurally inadequate, motivating evolution-aware security frameworks and formal verification for self-modifying systems.