Guided Streaming Stochastic Interpolant Policy
2026-05-11 • Robotics
RoboticsArtificial Intelligence
AI summaryⓘ
The authors address how to guide robot control policies at inference time without retraining, especially for fast and reactive tasks like avoiding obstacles. They develop a mathematical method using Stochastic Interpolants to ensure robots sample from desired target behaviors in real time. Their approach, called Streaming Stochastic Interpolant Policy (SSIP), improves on previous slower, chunk-based methods and supports two ways to guide policies: one that works without training and one that uses learned critics. Experiments show their method is more responsive and effective in complex, changing environments.
Inference-time guidanceGenerative robot policiesStochastic InterpolantsBackward Kolmogorov EquationStreaming Stochastic Interpolant PolicyTrajectory ensembleConditional critic guidanceReal-time controlDynamic environmentsPolicy adaptation
Authors
Puming Jiang, Meiyi Wang, Kelvin Lin, Ce Hao, Harold Soh
Abstract
Inference-time guidance is essential for steering generative robot policies toward dynamic objectives without retraining, yet existing methods are largely confined to chunk-based architectures that exhibit high latency and lack the reactivity needed for test-time preference alignment or obstacle avoidance. In this work, we formally derive the optimal guidance term for Stochastic Interpolants (SI) by analyzing the value function's time evolution via the Backward Kolmogorov Equation, establishing a modified drift that theoretically guarantees sampling from a target distribution. We apply this framework to real-time control through the Streaming Stochastic Interpolant Policy (SSIP), which generalizes the deterministic Streaming Flow Policy (SFP). Unifying this guidance law with the streaming architecture enables fast and reactive control. To support diverse deployment needs, we propose two complementary mechanisms: training-free Stochastic Trajectory Ensemble Guidance (STEG) that computes gradients on-the-fly for zero-shot adaptation, and training-based Conditional Critic Guidance (CCG) for amortized inference. Empirical evaluations demonstrate that our guided streaming approach significantly outperforms conventional chunk-based policies in reactivity and provides superior, physically valid guidance for dynamic, unstructured environments.