Foresight: Failure Detection for Long-Horizon Robotic Manipulation with Action-Conditioned World Model Latents

2026-06-22Robotics

Robotics
AI summary

The authors developed Foresight, a method to detect failures in long robotic tasks where it’s hard to tell exactly when things go wrong. Instead of needing detailed labels during the task, Foresight learns from the final success or failure outcome by analyzing predictive patterns in the robot’s actions and environment. They tested it using advanced robot control systems in simulations and real robots, showing it can reliably spot failures early. Their approach uses a type of model that understands how actions affect the world to improve monitoring across different robots and tasks.

long-horizon tasksfailure detectionrobotic manipulationaction-conditioned world modellatent representationsfunctional conformal predictiontrajectory monitoringrobot policiessimulation benchmarksreal robot experiments
Authors
Haoran Zhang, Yifu Lu, Boyang Wang, Xuhui Kang, Yen-Ling Kuo, Zezhou Cheng, Mengdi Wang, Odest Chadwicke Jenkins
Abstract
Long-horizon tasks are common in real-world robotic deployments, yet failure detection for such tasks remains underexplored. Detecting failures in long-horizon robotic tasks is particularly challenging because failure onset is often ambiguous and dense temporal annotations are typically unavailable. We present Foresight, a failure detection framework that monitors manipulation trajectories using latent representations from an action-conditioned world model. Foresight is trained using only final task-level success or failure labels. By leveraging predictive world-model embeddings, our method provides a unified framework for failure detection across different policies. We further use functional conformal prediction (FCP) to calibrate detection thresholds adaptively. We evaluate Foresight with state-of-the-art vision-language-action policies in simulation on LIBERO-Long, ManiSkill-Long, and BEHAVIOR-1K, compare it against state-of-the-artfailure detection methods, and validate it on real robots with three long-horizon tasks on a ReactorX-200 arm and one task on a Franka arm. Our results suggest that action-conditioned world-model embeddings provide a scalable representation for reliable failure monitoring in long-horizon manipulation.