WaveSync: Constrained Wavefront Optimization for Synchronized Co-Speech Gestures in Humanoid Robots

2026-06-15Robotics

Robotics
AI summary

The authors developed WaveSync, a system that helps humanoid robots make hand gestures that match what they're saying. It breaks down speech into important words and uses this to guide smooth, realistic gestures that follow the robot's movement limits. They fine-tune the timing so the gestures sync well with speech and avoid movements that the robot can't do. Tests showed WaveSync does better than other methods in making natural and well-coordinated gestures.

co-speech gestureshumanoid robotsLarge Language ModelDynamic Movement Primitivessemantic importancegesture-speech synchronizationkinematic constraintsmotion planningwavefront optimizationgesture trajectories
Authors
Thang Tran Viet, Thanh Nguyen Canh, Gia Huy Uong, Phuc Van Dinh, Tan Viet Tuyen Nguyen, Xiem HoangVan, Nak Young Chong
Abstract
Expressive co-speech gestures are crucial for natural human-robot interaction, but generating them on physical humanoid robots is difficult because gesture strokes must align with speech emphasis while satisfying strict kinematic and dynamic constraints. Unlike virtual avatars, humanoid robots cannot freely execute rapid or overlapping motions, making word-level synchronization and hardware-safe motion planning a coupled problem. We present \textbf{WaveSync}, a hybrid framework in which a Large Language Model decomposes dialogue responses into structured semantic schemas and assigns per-word importance weights, constructing a continuous Semantic Importance Wave. Gesture trajectories are shaped through Dynamic Movement Primitives, enforcing kinematic feasibility while enhancing expressiveness. A Wavefront Optimization stage aligns peak-to-peak gesture-speech synchronization and resolves residual kinematic violations through gesture-duration compression and forward propagation. Experimental evaluation based on five dialogue scenarios shows that our method achieves high synchronization accuracy and outperforms three baselines in both objective and subjective evaluations. Each component in WaveSync plays a necessary role in producing gestures that are expressive, semantically grounded, and kinematically compliant. The code, resources, and videos are available at \href{https://github.com/pairs-lab/WaveSync}{WaveSync}