Back to the Familiar Future: Failure Recovery for VLA Policies via Pre-Imagined Milestone Selection
2026-06-08 • Robotics
Robotics
AI summaryⓘ
The authors address the problem of vision-language-action (VLA) systems going off their expected paths during tasks, making recovery difficult. They propose a method called Back to the Familiar Future (B2FF), which precomputes a set of 'milestone' states based on the initial clean view before starting the task. If the system deviates, a selector picks a recovery milestone as a goal to guide the system back to a known, safe path. Their approach improved success rates in tests without needing to retrain the action generator, showing that planning ahead with visual goals helps recover from mistakes.
vision-language-action (VLA)manipulation recoveryvisual conditioningmilestone bankfuture state predictionrecovery frameworkaction sequence stabilityfailure recoveryLIBERO benchmarkforesight-driven policy
Authors
Suyeon Shin, Juwon Kim, Hyeonbin Park, Hyunseo Kim, Hyundo Lee, Hyung-Sin Kim, Byoung-Tak Zhang
Abstract
Vision-language-action (VLA) policies can deviate from nominal trajectories during manipulation, even when tasks remain physically feasible. Recovering from these deviations is challenging, as they push the policy into unfamiliar state spaces where direct re-planning frequently destabilizes action sequences. We propose Back to the Familiar Future (B2FF), a recovery framework for foresight-driven VLAs that leverages future visual conditioning as a recovery interface. Before execution, the VLA generates a milestone bank of familiar future states conditioned on the clean initial observation. At recovery time, a recoverability-aware selector selects a recovery milestone from this bank and enforces it as a fixed visual goal. This enables the VLA to robustly map off-trajectory observations back to a familiar future. On failure-injected LIBERO, under controlled recovery timing aligned with the injected failure, B2FF increases the average success rate of a baseline VLA from 56.3% to 74.0%, demonstrating that pre-imagined milestones can guide recovery without fine-tuning the low-level action generator.