Spectral Progressive Diffusion for Efficient Image and Video Generation

2026-05-18 • Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition

AI summaryⓘ

AI summary is being generated…

Authors

Howard Xiao, Brian Chao, Lior Yariv, Gordon Wetzstein

Abstract

Diffusion models have been shown to implicitly generate visual content autoregressively in the frequency domain, where low-frequency components are generated earlier in the denoising process while high-frequency details emerge only in later timesteps. This structure offers a natural opportunity for efficient generation, as high-resolution computation on noise-dominated frequencies is largely redundant. We propose Spectral Progressive Diffusion, a general framework that progressively grows resolution along the denoising trajectory of pretrained diffusion models. To this end, we develop a spectral noise expansion mechanism and derive an optimal resolution schedule from the model's power spectrum. Our framework supports training-free acceleration and a novel fine-tuning recipe that further improves efficiency and quality. We demonstrate significant speedups on state-of-the-art pretrained image and video generation models while preserving visual quality.

View PDFOpen arXiv