Flowing With Purpose: Latent Action Guided Flow Matching Policies For Robotic Manipulation

2026-06-22Robotics

Robotics
AI summary

The authors looked at improving how robots learn to copy actions by using flow matching, a common technique. They found that current methods start with a fixed, simple assumption about actions that doesn't match the complex variety of real robot movements. To fix this, they created a new approach called LAFM, which uses adaptable, learned action patterns based on the robot's current situation to guide learning. This makes the robot's learning more efficient and improves how well it performs tasks in both tests and real-world use. Their method even beats bigger, more complex models while using smaller ones.

Flow MatchingBehavior CloningRobotic ManipulationLatent Action ModelMotion PrimitivesHeteroscedasticityDenoising ProcessTask Success RateVision-Language-Action ModelsPolicy Performance
Authors
Bruno Machado, Alexandre Chapin, Emmanuel Dellandrea, Liming Chen
Abstract
Flow matching has recently become a new standard for behavior cloning in robotic manipulation. However, state-of-the-art flow matching policies suffer from a systematic structural mismatch: they rely on a globally fixed isotropic source distribution despite the strongly fragmented and heteroscedastic structure of robotic action spaces. This agnostic initialization forces the model to learn highly entangled vector fields, bottlenecking training efficiency and limiting overall policy performance. To address this limitation, we introduce Latent Action Guided Flow Matching (LAFM), a novel framework that replaces the monolithic Gaussian with an adaptive library of learned prior distributions. By grounding these distributions using a latent action model, LAFM maps current observations to discrete motion primitives, selecting a specialized base distribution that provides an informed, structurally aligned initialization for the denoising process. This dynamic adaptivity naturally accommodates heteroscedasticity in human demonstrations and makes transport trajectories shorter and less entangled. Empirically, LAFM substantially outperforms standard flow matching formulations, increasing task success rates by 23.4% in real-world robotic deployments and by 10.4% on the LIBERO-90 benchmark. Furthermore, we demonstrate that LAFM achieves state-of-the-art results, surpassing massively pre-trained vision-language-action models while utilizing significantly smaller architectures.