AAD-1: Asymmetric Adversarial Distillation for One-Step Autoregressive Video Generation
2026-06-02 • Computer Vision and Pattern Recognition
Computer Vision and Pattern Recognition
AI summaryⓘ
The authors introduce AAD-1, a new method for generating videos from images in a single step. They improve on past methods by making the generator and discriminator work differently: the generator creates videos step-by-step while the discriminator looks at the whole video at once to better spot errors. They also add a training phase to help the system learn more stably before full training begins. Their tests show that AAD-1 produces more realistic videos without freezing the motion, a common problem in earlier methods.
autoregressive generationadversarial distillationgeneratordiscriminatormotion collapsebidirectional attentionspatiotemporal contextdistribution matchingtraining stabilityvideo generation
Authors
Haobo Li, Yanhong Zeng, Yunhong Lu, Jiapeng Zhu, Hao Ouyang, Qiuyu Wang, Ka Leong Cheng, Yujun Shen, Zhipeng Zhang
Abstract
We present AAD-1, an Asymmetric Adversarial Distillation framework for One-step autoregressive image-to-video generation. State-of-the-art methods adopt adversarial distillation but suffer from motion collapse and training instability, resulting in static videos. AAD-1 addresses these challenges through two key designs in architecture and training strategy. Our key architectural insight is to break the symmetry between generator and discriminator. While the generator remains causal to preserve autoregressive sampling capability, the discriminator attends bidirectionally over the full spatiotemporal context and produces a single holistic realism score for the entire video sequence. This asymmetric design enables the discriminator to effectively detect global temporal failures and long-range drift that cause motion collapse in autoregressive generation. To stabilize training, we introduce a phased strategy that first uses distribution matching to bootstrap a stable one-step generator, providing a warm-up phase that brings the student distribution closer to the teacher before adversarial distillation begins. Extensive experiments on VBench demonstrate that AAD-1 achieves state-of-the-art performance in one-step autoregressive video generation.