Dynamic Malicious Skills in Agentic AI

2026-06-15Cryptography and Security

Cryptography and Security
AI summary

The authors studied how AI agents that use 'skills' to perform tasks can be tricked by hiding harmful instructions in their documentation. These hidden instructions can make the AI add dangerous behaviors while it's running. They tested this trick on popular AI systems and showed it can work quite often. To stop this, the authors suggest protecting the skills with a security method that makes the related files read-only, which blocks the harmful changes but lets safe skills keep working.

agentic AIskillsmalicious instructionsdynamic code injectionattack surfaceAI frameworksread-only mountsoperating system kernelsecurity defense
Authors
Tianhao Chen, Zhengyuan Jiang, Yuepeng Hu, Yebei Gou, Neil Zhenqiang Gong
Abstract
Skills are a key enabling component of agentic AI. While they enhance agents' capabilities, they also introduce new attack surfaces. In this work, we investigate one such attack surface by demonstrating dynamic malicious skills. By embedding malicious instructions in natural-language documentation (e.g., SKILL.md), an attacker can induce an agent to dynamically inject malicious logic into an otherwise benign skill during execution. We evaluate this attack across agentic frameworks such as OpenHands and Claude Code, showing that dynamic malicious skills can successfully introduce a range of malicious behaviors at runtime with non-trivial success rates. To mitigate this vulnerability, we propose a system-level defense that prevents dynamic modification of skills using operating system kernel-enforced read-only mounts. Our evaluation demonstrates that this defense effectively blocks dynamic malicious skills while preserving the functionality of benign skills.