Colored Noise Diffusion Sampling

2026-05-28Computer Vision and Pattern Recognition

Computer Vision and Pattern Recognition
AI summary

The authors look at how diffusion models create images step-by-step, noticing that these models handle large blurry shapes first and add in fine details later. They point out that current methods add random noise evenly throughout this process, which isn't efficient. The authors propose a new method called Colored Noise Sampling (CNS), which adds noise in a smarter way based on the image's detail level at each step. Their method improves image quality on several benchmarks without needing extra training, simply by changing the sampling process.

diffusion modelsstochastic differential equationsspectral biasnoise injectionimage synthesissampling methodsfinite energy budgetFID scoreClassifier-Free GuidanceImageNet-256
Authors
Hadar Davidson, Noam Issachar, Sagie Benaim
Abstract
Diffusion models achieve state-of-the-art image synthesis, with their generative trajectories fundamentally exhibiting a spectral bias, resolving low-frequency global structures early and high-frequency fine details later. Conventional stochastic differential equation (SDE) solvers fail to account for this dynamic, naively injecting uniform white noise throughout the entire process and misusing the finite energy budget. In this work, we establish a mathematical framework that reconsiders SDE inference as a targeted, frequency-decoupled energy transfer. Leveraging this framework, we introduce Colored Noise Sampling (CNS), a novel, training-free stochastic solver. Rather than injecting uniform white noise, CNS utilizes a dynamic, timestep- and frequency-dependent schedule that more efficiently allocates injected energy toward structurally unresolved frequency bands. By actively exploiting the model's inherent spectral bias, CNS systematically steers the generated distribution toward the true data manifold. Extensive experiments demonstrate that CNS significantly outperforms standard ODE and SDE baselines as a strictly plug-and-play, inference-time sampler substitution across diverse architectures (SiT, JiT, FLUX). Compared to standard sampling on ImageNet-256, CNS achieves substantial unguided FID reductions, improving from 8.26 to 6.27 on SiT-XL/2, 32.39 to 26.69 on JiT-B/16, and 11.88 to 8.31 on JiT-H/16, while yielding consistent relative FID improvements with Classifier-Free Guidance. Project page is available at https://hadardavidson.github.io/CNS/.