PATCH: Action-Chunk-Conditioned Latent Patch Innovation Monitoring for Robot Manipulation

2026-06-15Robotics

RoboticsArtificial IntelligenceComputer Vision and Pattern Recognition
AI summary

The authors created PATCH, a system that helps robots handle unexpected changes during tasks, like moving objects or sudden disturbances. PATCH watches small sections of the robot's camera view related to what the robot is currently doing and predicts how these parts should look if everything goes as planned. If it sees something unusual that can't be explained by the robot's own movements, it signals the robot to pause and recover before continuing. Their tests show PATCH works better than other monitors at spotting real problems and avoiding false alarms during robot manipulation tasks.

robot manipulationruntime monitorlatent patchexecution corridorinterventionrecoveryvisual innovationpolicy resumptiondisturbance detectionaction chunk
Authors
Yanan Zhou, Ranpeng Qiu, Yincong Chen, Jiajie Cui, Weiming Zhi
Abstract
Learning-based manipulation policies have made substantial progress in real-world robot manipulation, particularly for short-horizon action generation. However, deployment in open workspaces remains fragile under unexpected local scene dynamics, such as moving objects, transient occlusions, or disturbances near the intended motion. Existing runtime monitors often rely on global observation anomalies, policy uncertainty, or frame-level visual changes, and struggle to distinguish task-relevant execution risk from benign visual variation. We introduce PATCH, an action-chunk-conditioned latent patch innovation monitor for deployment-time intervention. Given the active action chunk, PATCH defines a projected execution corridor, predicts latent patch evolution inside it, and accumulates persistent residuals unexplained by the robot's own motion. These residuals form a localized intervention signal that allows PATCH-Router to pause execution, select an available recovery source, and resume the original policy once localized innovation subsides. Experiments on real robot rollout data show that PATCH produces more stable and context-relevant triggers than competing runtime monitors. Real-robot deployment further demonstrates monitor-driven intervention and policy resumption for disturbance-aware manipulation. Project Page: https://yananzhou5555.github.io/PATCH/.