Retrieve-then-Steer: Online Success Memory for Test-Time Adaptation of Generative VLAs

2026-05-11Robotics

RoboticsArtificial Intelligence
AI summary

The authors studied how robots that use vision and language to act (VLA models) usually struggle when working repeatedly in the same environment. They developed a way for the robot to remember actions that worked well before and use that memory to guide new actions without changing the robot’s original training. By retrieving and combining past successful actions with new observations, the robot becomes more reliable, especially for longer or multi-step tasks. Their approach improves robot performance without needing to update the model’s internal settings during use.

Vision-Language-Action modelsTest-time adaptationClosed-loop controlTrajectory consistencyFlow-matching samplerAction retrievalGenerative modelsRobotic manipulationLong-horizon tasksNon-parametric methods
Authors
Jianchao Zhao, Huoren Yang, Hu Yusong, Yuyang Gao, Qiguan Ou, Cong Wan, SongLin Dong, Zhiheng Ma, Yihong Gong
Abstract
Vision-Language-Action (VLA) models show strong potential for general-purpose robotic manipulation, yet their closed-loop reliability often degrades under local deployment conditions. Existing evaluations typically treat test episodes as independent zero-shot trials. However, real robots often operate repeatedly in the same or slowly changing environments, where successful executions provide environment-verified evidence of reliable behavior patterns. We study this persistent-deployment setting, asking whether a partially competent frozen VLA can improve its reliability by reusing its successful test-time experience. We propose an online success-memory guided test-time adaptation framework for generative VLAs. During deployment, the robot stores progress-calibrated successful observation-action segments in a long-term memory. At inference, it retrieves state-relevant action chunks, filters inconsistent candidates via trajectory-level consistency, and aggregates them into an elite action prior. To incorporate this prior into action generation, we introduce confidence-adaptive prior guidance, which injects the elite prior into an intermediate state of the flow-matching action sampler and adjusts the guidance strength based on retrieval confidence. This design allows the frozen VLA to exploit environment-specific successful experience while preserving observation-conditioned generative refinement. This retrieve-then-steer mechanism enables lightweight, non-parametric test-time adaptation without requiring parameter updates. Simulation and real-world experiments show improved task success and closed-loop stability, especially in long-horizon and multi-stage tasks.