VENOM: Versatile Embodied Network for Omni-bodied Motion tracking

2026-06-15Robotics

Robotics
AI summary

The authors created VENOM, a model that helps robots with different body shapes mimic full-body human motions more accurately in simulation. Unlike previous methods that handled upper and lower body separately, VENOM tracks the whole body together using a GPT-based approach. They built a special dataset with movement info from various humanoid robots to train VENOM. Their tests show VENOM works better than simpler models and matches expert-level performance without needing extra reward-based training.

humanoid robotsfull-body motion trackingcross-embodimentGPT modelmotion datasetsupervised learningreinforcement learningasymmetric actor-criticmulti-humanoid datarobot control
Authors
Siddharth Padmanabhan, Kazuki Miyazawa, Takato Horii
Abstract
Achieving expert-level expressive full-body motion tracking across multiple humanoids solely from demonstration data remains a challenging and relatively an underexplored problem in humanoid robot learning. Cross-embodiment motion tracking policies are mostly trained by decoupling the control problem into upper and lower body control. This work proposes VENOM, a cross-embodiment full-body motion tracking model for humanoids in simulation. VENOM is a GPT-based motion tracker trained on multiple humanoid data that can track the entire body without the requirement to split into upper and lower body control. We curate a multi-humanoid motion tracking dataset called the VENOM dataset that contains states, actions, and rewards and train VENOM and the baselines on this dataset. In this letter, we evaluate VENOM's performance against baselines and show that we can achieve a stable motion tracker across different humanoids more capable than an MLP trained on multiple humanoid data with supervised learning alone, and also show that despite lack of reward feedback, VENOM closely matches the tracking capability of experts that were trained using asymmetric-actor critic reinforcement learning.