FlowMPC: Improving Flow Matching policies with World Models

2026-06-15 • Machine Learning

Machine LearningArtificial IntelligenceRobotics

AI summaryⓘ

The authors looked at a method called Flow Matching (FM) that helps computers learn actions, but it doesn't always pick the best actions to succeed. They tested whether using a world model—a way for the computer to imagine future outcomes—can help FM choose better actions by planning ahead. They created a system called FlowMPC that combines FM with this planning method and showed it works better on certain robot manipulation tasks. Their results suggest that adding planning with a world model helps improve performance without changing how FM is trained.

Flow Matchingbehavior cloningworld modelModel Predictive Path Integral (MPPI) planningTD-MPC2imitation learningrobot manipulationManiSkillpolicyexpected return

Authors

Chandon Hamel

Abstract

Flow Matching (FM) is a powerful approach for behavior cloning in multimodal action spaces [Jiang et al., 2025], but because it is not trained to directly maximize expected return, there is still room to improve how FM policies act at test time. This work investigates whether a learned world model can improve FM policies by enabling Model Predictive Path Integral (MPPI) planning over candidate action sequences proposed by the policy. Building on TD-MPC2 [Hansen et al., 2024], I introduce FlowMPC, a framework that combines an imitation-learned FM policy with a learned world model for test-time planning in ManiSkill manipulation tasks [Tao et al., 2025]. Across PickCube and PickSingleYCB, adding the world model improved performance over the FM policy alone, with especially clear gains in end-of-episode success. These results suggest that world-model-based planning can effectively complement flow-based imitation policies without modifying the FM training objective.

View PDFOpen arXiv