MotionDreamer: Universal Skeletal Motion Generation for 3D Rigged Shapes

2026-06-01 • Computer Vision and Pattern Recognition

Computer Vision and Pattern RecognitionGraphics

AI summaryⓘ

The authors present MotionDreamer, a new method that creates animations for 3D models using videos without relying on specific body shapes or expensive optimizations. They built a large dataset of 20,000 textured and rigged 3D models with animations to train their system. Their approach connects visual motion from 2D videos to different 3D skeletons by linking texture and meaning to joints, allowing it to animate many unseen creatures realistically. Tests show their method works better and faster than previous systems for creating animated 3D models.

rigged shapesskeletal animationdiffusion model2D video guidance3D modelstexture mappinganimation synthesiskinematicssemantic injection4D asset production

Authors

Ye Tao, Yuxin Yao, Kendong Liu, Dapeng Wu, Junhui Hou

Abstract

Motion generation for rigged shapes is vital for scalable 4D asset production. However, template-based methods are limited by specific topologies and fail to generalize across diverse morphologies. Conversely, per-case optimization is computationally expensive, susceptible to local optima, and highly sensitive to viewpoint-induced ambiguities. In this paper, we present MotionDreamer, a diffusion-based framework designed for category-agnostic skeletal animation generation from 2D video guidance. To overcome the scarcity of high-quality training data, we have curated a large-scale dynamic dataset comprising approximately 20,000 diverse 3D models, each featuring complete textures, skeletal rigging, and a wide array of comprehensive animation sequences. To bridge the kinematic gap between 2D visual motion cues and heterogeneous 3D skeletal structures, we propose a structural-semantic injection mechanism. Our model integrates texture and semantic attributes directly into skeletal joint representations. This allows it to map perceived visual dynamics to specific joint hierarchies and their functional roles. This enables MotionDreamer to synthesize high-fidelity animations that maintain anatomical consistency across a vast range of unseen categories, from existing biological species to fantastical beings. Extensive experiments demonstrate that our approach significantly outperforms existing methods, setting a new state-of-the-art benchmark for robust and efficient 4D asset generation. The code will be made publicly available upon acceptance.

View PDFOpen arXiv