HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers
2026-06-04 • Robotics
RoboticsArtificial IntelligenceMachine Learning
AI summaryⓘ
The authors created HANDOFF, a controller that helps humanoid robots perform tasks by using a simple and clear communication method between planning and movement. Instead of needing complicated instructions, HANDOFF learns from three expert methods to handle walking, moving safely, and recovering from falls. They tested it on a robot and found it matches advanced performance while allowing a wide range of movements. The system also works with natural language commands without extra training, showing it can handle real-world robot tasks more easily.
Humanoid RobotWhole-Body ControlCommand SpaceKL DistillationMixture-of-ExpertsLocomotionFall-RecoveryVelocity TrackingVision-Language Model (VLM)Agentic Planner
Authors
Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou, Gio Huh, Robert Griffin, Georgia Gkioxari, Aaron Ames
Abstract
For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.