AnimationBench: Are Video Models Good at Character-Centric Animation?
2026-04-16 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors created AnimationBench, a new test system to better judge how well videos are made from images in animation style. They noticed that existing tests mostly focus on realistic videos, which don't work well for animations that have unique looks and exaggerated movements. Their benchmark uses classic animation rules and other quality measures to fairly judge animation videos and can handle both fixed and open-ended tests. They also use AI models to help with scoring, and their experiments show AnimationBench matches human opinions better and finds issues current tests miss.
video generationanimation styleimage-to-video (I2V)benchmarkTwelve Basic Principles of Animationsemantic consistencymotion rationalitycamera motion consistencyevaluation metricsvisual-language models
Authors
Leyi Wu, Pengjun Fang, Kai Sun, Yazhou Xing, Yinwei Wu, Songsong Wang, Ziqi Huang, Dan Zhou, Yingqing He, Ying-Cong Chen, Qifeng Chen
Abstract
Video generation has advanced rapidly, with recent methods producing increasingly convincing animated results. However, existing benchmarks-largely designed for realistic videos-struggle to evaluate animation-style generation with its stylized appearance, exaggerated motion, and character-centric consistency. Moreover, they also rely on fixed prompt sets and rigid pipelines, offering limited flexibility for open-domain content and custom evaluation needs. To address this gap, we introduce AnimationBench, the first systematic benchmark for evaluating animation image-to-video generation. AnimationBench operationalizes the Twelve Basic Principles of Animation and IP Preservation into measurable evaluation dimensions, together with Broader Quality Dimensions including semantic consistency, motion rationality, and camera motion consistency. The benchmark supports both a standardized close-set evaluation for reproducible comparison and a flexible open-set evaluation for diagnostic analysis, and leverages visual-language models for scalable assessment. Extensive experiments show that AnimationBench aligns well with human judgment and exposes animation-specific quality differences overlooked by realism-oriented benchmarks, leading to more informative and discriminative evaluation of state-of-the-art I2V models.