M3imic: Learning a Versatile Whole-Body Controller for Multimodal Motion Mimicking

2026-06-03Robotics

Robotics
AI summary

The authors created a new controller for humanoid robots that can handle different types of movement instructions, like joint angles or hand positions, all in one system. Their approach uses special encoders to turn various input types into a common format, allowing a single trained policy to work with all of them. They tested this on a robot both in simulations and real life, showing it works well without needing separate training for each movement type. This makes robots more flexible in performing tasks that involve walking and manipulating objects.

humanoid robotswhole-body controlmotion reference modalitiesend-effector trajectoryreinforcement learningsim-to-real transferlatent spacerobot joint anglesloco-manipulation
Authors
Zuxing Lu, Ziang Zheng, Yao Lyu, Jingyu Liu, Feihong Zhang, Song Lu, Xin Yuan, Changyin Sun, Xingxing Zuo, Shengbo Eben Li
Abstract
Building a general-purpose whole-body controller is essential for enabling diverse motion capabilities in humanoid robots across a wide range of downstream tasks, including locomotion and loco-manipulation. Different tasks rely on distinct motion reference modalities: locomotion primarily depends on coordinated robot joint trajectories, whereas manipulation requires precise end-effector trajectory tracking. Existing methods often overlook the representational mismatch between dense robot joint angles and sparse end-effector poses. To address this, we propose Multi-Modal Mimic (M3imic), a versatile multi-modal whole-body control framework that unifies heterogeneous motion reference modalities, including robot joint angles, human pose trajectories, and end-effector poses, using modality-specific encoders to map them into a shared latent space. Leveraging large-scale reinforcement learning in the simulator, we train a single policy that achieves sim-to-real transfer across multiple motion reference modalities without modality-specific retraining. Extensive simulation and real-world experiments on the Unitree G1 robot are conducted to evaluate the proposed framework. In simulation, the policy achieves a peak success rate of 98.42\% on an unseen test dataset, demonstrating its exceptional generalization capability. The code is available at https://github.com/Renforce-Dynamics/MultiModalWBC