Generative Drifting is Secretly Score Matching: a Spectral and Variational Perspective
2026-03-10 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study a method called Generative Modeling via Drifting, which creates images in one step using a special drift operation. They discover that with a Gaussian kernel, this drift exactly matches differences in smoothed probability scores, helping answer important open questions about the method's theory. They analyze how the method's training works, why certain kernels work better, and why a key stop-gradient step is needed for stability. Their work connects the method to known mathematical concepts like score-matching and gradient flows, and they suggest improvements like changing the kernel bandwidth over time and new types of drift operators.
Generative ModelingDrift OperatorGaussian KernelScore MatchingMcKean-Vlasov DynamicsLandau DampingWasserstein Gradient FlowKL DivergenceStop-Gradient OperatorSinkhorn Divergence
Authors
Erkan Turan, Maks Ovsjanikov
Abstract
Generative Modeling via Drifting has recently achieved state-of-the-art one-step image generation through a kernel-based drift operator, yet the success is largely empirical and its theoretical foundations remain poorly understood. In this paper, we make the following observation: \emph{under a Gaussian kernel, the drift operator is exactly a score difference on smoothed distributions}. This insight allows us to answer all three key questions left open in the original work: (1) whether a vanishing drift guarantees equality of distributions ($V_{p,q}=0\Rightarrow p=q$), (2) how to choose between kernels, and (3) why the stop-gradient operator is indispensable for stable training. Our observations position drifting within the well-studied score-matching family and enable a rich theoretical perspective. By linearizing the McKean-Vlasov dynamics and probing them in Fourier space, we reveal frequency-dependent convergence timescales comparable to \emph{Landau damping} in plasma kinetic theory: the Gaussian kernel suffers an exponential high-frequency bottleneck, explaining the empirical preference for the Laplacian kernel. We also propose an exponential bandwidth annealing schedule $σ(t)=σ_0 e^{-rt}$ that reduces convergence time from $\exp(O(K_{\max}^2))$ to $O(\log K_{\max})$. Finally, by formalizing drifting as a Wasserstein gradient flow of the smoothed KL divergence, we prove that the stop-gradient operator is derived directly from the frozen-field discretization mandated by the JKO scheme, and removing it severs training from any gradient-flow guarantee. This variational perspective further provides a general template for constructing novel drift operators, demonstrated with a Sinkhorn divergence drift.