Partially Observable Adversarial Patch Attacks on Vision-Language-Action Models in Robotics

2026-06-02Robotics

Robotics
AI summary

The authors studied how robots that understand language and vision can be tricked by special images called adversarial patches. Unlike past work that needed to see the whole robot action plan, they focused on a more realistic case where the attacker only sees the start of the robot's steps. They created a method to first find important areas in robot vision and then design patches that confuse the robot's understanding and movement. Their tests in both simulations and real robots showed these patches can reliably cause the robot to fail its tasks even with limited information.

Vision-language-action modelsAdversarial patchesPartial observabilitySemantic groundingRobotic controlAttention mapsTrajectory optimizationSimulationRobot perceptionAdversarial attack
Authors
Xiaofei Wang, Mingliang Han, Tianyu Hao, Yi Yang, Yun-Bo Zhao, Keke Tang
Abstract
Vision-language-action (VLA) models are gaining attention in robotics, yet their robustness to adversarial attacks remains largely unexplored. Existing work shows that adversarial patches can mislead VLA-based robots but assumes full access to the entire execution trajectory, an unrealistic requirement in practice. We address this limitation by formulating a partially observable threat model, where the adversary can exploit only a short prefix of the trajectory to generate a fixed patch applied to all subsequent frames. Under this setting, we propose a two-phase framework. First, we localize the patch using the model's attention maps to identify visually critical regions that correspond to the full instruction. Then, we optimize the patch to disrupt the semantic grounding of target objects and increase the curvature of action trajectories, thereby compounding failures in both perception and control. Extensive experiments in simulation and real-world robotic environments show that our method sustains adversarial effects under partial observability, inducing long-horizon disruptions and significantly reducing task success rates.