PHASOR: Phase-Anchored Universal Action Representations for Humanoid Embodiments
2026-06-01 • Robotics
Robotics
AI summaryⓘ
The authors focus on creating a better way to represent robot actions that works across different robot types instead of being tied to just one. They break down movements into repeating cycles using math tools and also capture unique details that don't repeat, making the representation clearer and easier to understand. By training this shared action space on human movements and applying it to different humanoid robots, they enable robots to learn and transfer actions more effectively. This approach helps make robot actions more interpretable and adaptable across various robot designs.
action embeddingrobot policy learningmotion periodicityFFT (Fast Fourier Transform)phase manifoldpose representationcross-embodimentmotion semanticsknowledge distillationrobot transfer learning
Authors
Kihyun Kim, Chaeyun Kim, Jongho Shin, Taeyoun Kwon, Junghyun Kim, Mijin Koo, Haon Park
Abstract
Learning a good action embedding space is fundamental to scalable robot policy learning, yet existing methods treat action latents as task-specific intermediates rather than first-class representations. The resulting latents are unstructured, embodiment-specific, and weakly tied to motion semantics, limiting interpretability, controllability, and transferability across robots. We position the action embedding space itself as a first-class design target, with downstream policy quality emerging from representation quality. Exploiting motion's intrinsic periodicity, we factorize it into a phase manifold that captures cyclic structure via FFT-parametric coefficients, together with a pose branch that conditions the manifold on non-periodic configuration detail. Combined with motion-semantic distillation, this factorized structure yields a cross-embodiment motion manifold that is interpretable and embodiment-agnostic by design. Anchoring multiple humanoid robots to a shared human-pretrained manifold then produces a unified action embedding space across diverse platforms, achieving strong cross-embodiment retrieval and consistent gains on downstream robot tasks.