HOLO-MPPI: Multi-Scenario Motion Planning via Hierarchical Policy Optimization

2026-06-15 • Robotics

RoboticsArtificial Intelligence

AI summaryⓘ

The authors developed HOLO-MPPI, a method combining high-level learned plans with low-level real-time control to help robots, like self-driving cars, handle many different situations without needing to be relearned for each one. They train a high-level decision maker offline to suggest good plan ideas, and then during driving, a controller fine-tunes the plan in real time to deal with unexpected changes. This approach performs better than traditional methods and keeps the ability to respond quickly. They tested their method specifically in diverse driving tasks with success.

reinforcement learningmodel predictive path integralmotion planningoffline learningreal-time controlsampling priorautonomous drivingdistribution shiftstochastic optimal control

Authors

Youngjae Min, Jovin D'sa, Faizan M. Tariq, David Isele, Navid Azizan, Sangjae Bae

Abstract

Robots deployed in the real world must plan motions across diverse scenarios without per-scenario retuning. End-to-end reinforcement learning (RL) can generalize across scenarios but often becomes brittle under distribution shift, reward misspecification, and stochastic interactions. Model predictive path integral (MPPI) control enables strong real-time refinement without gradients, but its performance depends on a well-shaped sampling prior, while manually designing the priors does not scale to multi-scenario deployment. We present HOLO-MPPI (High-level Offline, Low-level Online MPPI), a multi-scenario motion planning framework that combines high-level policy learning with low-level stochastic optimal control. Offline, we learn a high-level policy that proposes scenario-robust plans in an abstract action space, with a learned world model for online rollout. Online, the policy serves as a data-driven prior generator that parameterizes MPPI's sampling distribution conditioned on the current observation and goal. MPPI then optimizes low-level control sequences around this prior in real time to adapt to local disturbances. We instantiate HOLO-MPPI in autonomous driving by designing an effective high-level action space and tailored model architectures. Our evaluation across diverse driving scenarios shows that HOLO-MPPI improves upon MPPI and end-to-end RL baselines while maintaining real-time control.

View PDFOpen arXiv