Improving Robotic Generalist Policies via Flow Reversal Steering

2026-06-11Robotics

Robotics
AI summary

The authors developed a method called Flow Reversal Steering (FRS) to help robots choose better actions by working backward from actions that are okay but not great. This technique uses a special kind of policy called a flow matching generalist to improve robot behavior based on rough guidance from people or vision-language systems. They tested FRS on different robots and tasks, showing it can quickly boost success rates and help robots learn even when usual training methods struggle. Additionally, they showed how these improvements can be copied into simpler policies for faster learning.

generalist policiesflow matchingrobot manipulationzero-shot controlbehavioral cloningreinforcement learningvision-language modelslatent noisepolicy improvementsemantic guidance
Authors
Andy Tang, William Chen, Andrew Wagenmaker, Chelsea Finn, Sergey Levine
Abstract
Generalist policies can learn a wide range of skills from diverse robot datasets. In order to solve or improve on challenging news tasks, we need a way to infer and invoke the appropriate actions from the policy's rich behavioral prior, especially when directly commanding the policy fails. We focus on flow matching generalists and propose Flow Reversal Steering (FRS): a method that takes suboptimal but ``reasonable'' actions, finds their latent noises by passing them through the flow policy in reverse, and maps them to nearby generalist action modes. We evaluate FRS across many simulated and real-world manipulation settings. First, FRS can turn coarse semantic guidance from humans or vision-language models (VLMs) into corresponding good robot actions, improving zero-shot control. These gains can be distilled with behavioral cloning by training an auxiliary policy to output noises that the generalist maps to good actions -- showing up to 95% absolute task success rate boosts in under a minute of training. Finally, FRS enables policy improvement by bootstrapping reinforcement learning with semantic knowledge, improving on several tasks that standard RL fails to improve on.