AnimateAnyMesh++: A Flexible 4D Foundation Model for High-Fidelity Text-Driven Mesh Animation

2026-04-29 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

The authors improved how computers create animated 3D shapes that move over time by making three key upgrades. They expanded their training data a lot, making it more varied and covering more types of shapes and movements. They also redesigned parts of their system to better capture detailed movements and keep the shapes looking correct without weird glitches. Lastly, they updated the system to handle longer animations without losing quality. These improvements let their method create better and faster animations from text descriptions compared to earlier methods.

4D content generation3D mesh animationspatio-temporal modelingDyMesh-XL datasetvariational autoencoder (VAE)power-law topologyvertex normalsrectified-flow generatorsequence modelingtext-driven animation

Authors

Zijie Wu, Chaohui Yu, Fan Wang, Xiang Bai

Abstract

Recent advances in 4D content generation have attracted increasing attention, yet creating high-quality animated 3D models remains challenging due to the complexity of modeling spatio-temporal distributions and the scarcity of 4D training data. We present AnimateAnyMesh++, a feed-forward framework for text-driven animation of arbitrary 3D meshes with substantial upgrades in data, architecture, and generative capability. First, we expand the DyMesh-XL dataset by mining dynamic content from Objaverse-XL, increasing the number of unique identities from 60K to 300K and substantially broadening category and motion diversity. Second, we redesign DyMeshVAE-Flex with power-law topology-aware attention and vertex-normal enhanced features, which significantly improves trajectory reconstruction, local geometry preservation, and mitigates trajectory-sticking artifacts. Third, we introduce architectural changes to both DyMeshVAE-Flex and the rectified-flow (RF) generator to support variable-length sequence training and generation, enabling longer animations while preserving reconstruction fidelity. Extensive experiments demonstrate that AnimateAnyMesh++ generates semantically accurate and temporally coherent mesh animations within seconds, surpassing prior approaches in quality and efficiency. The enlarged DyMesh-XL, the upgraded DyMeshVAE-Flex, and variable-length RF together deliver consistent gains across benchmarks and in-the-wild meshes. We will release code, models, and the expanded DyMesh-XL upon acceptance of this manuscript to facilitate research in 4D content creation.

View PDFOpen arXiv