Seeing Is Not Screening: Multimodal Hidden Instruction Attacks on Agent Skill Scanners

2026-06-16Cryptography and Security

Cryptography and SecurityComputer Vision and Pattern Recognition
AI summary

The authors studied how current security checks for AI skills mainly look at text and code but often miss hidden harmful instructions inside images. They created SkillCamo, a way to hide bad commands inside images combined with text that looks normal, tricking AI systems into following harmful instructions. To stop this, the authors developed ExecScan, a new tool that looks at text, code, images, and how they work together to find hidden harmful instructions and predict dangerous behaviors. Their tests show that while current scanners struggle with these image-hidden attacks, ExecScan does a better job detecting them.

LLM-based systemsagent skillsmultimodal agentsSkillCamomalicious instructionsexecution-grounded scanningExecScanbehavior reconstructionsecurity analysisprivilege escalation
Authors
Xiaojun Jia, Jie Liao, Simeng Qin, Ke Ma, Wenbo Guo, Yebo Feng, Aishan Liu, Yang Liu
Abstract
Agent skills are emerging as an important attack surface in LLM-based systems. Through an empirical study of existing skill scanners, we find that current defenses primarily rely on textual descriptions, manifests, and source code as the main signals for security analysis, which can leave visually conveyed malicious intent insufficiently examined. This creates a practical blind spot: harmful operational instructions hidden in images may bypass scanning while still being recoverable by multimodal agents during deployment. To systematically investigate this threat, we propose SkillCamo, a document-mediated multimodal instruction attack that conceals malicious instructions within images bundled with a skill while rewriting the surrounding documentation to naturally reference those images as part of the normal workflow. Thus, the attack does not rely on the image alone, but on the joint interpretation of textual guidance and visual payload at execution time. To defend against such attacks, we further propose ExecScan, an execution-grounded multimodal scanning module that performs intent extraction, behavior reconstruction, abuse assessment, and deliberative execution simulation over skill artifacts. ExecScan jointly analyzes documentation, code, referenced resources, and visual content to recover hidden instructions, reconstruct executable behavior chains, and identify downstream risks such as exfiltration, destruction, persistence, deception, and privilege escalation. Extensive experiments show that image-hidden malicious instructions challenge existing skill scanners, while ExecScan can improve the skill scanning performance.