Polynomial Speedup in Diffusion Models with the Multilevel Euler-Maruyama Method

2026-03-25Machine Learning

Machine Learning
AI summary

The authors introduce a new method called Multilevel Euler-Maruyama (ML-EM) to solve equations that model random processes (SDEs and ODEs). Their method uses several drift function approximations with different accuracies, doing fewer costly evaluations and more cheap ones to save computing time. When the problem is very hard to approximate, ML-EM achieves the solution using far less work than usual methods, roughly equal to just one best approximation evaluation. They test this in image generation tasks and show it speeds up the process by up to four times on a small dataset. This suggests bigger improvements are possible for larger problems.

Stochastic Differential Equations (SDE)Ordinary Differential Equations (ODE)Euler-Maruyama methodDrift functionMultilevel methodsHarder than Monte Carlo (HTMC) regimeUNetDiffusion modelsImage generationComputational complexity
Authors
Arthur Jacot
Abstract
We introduce the Multilevel Euler-Maruyama (ML-EM) method compute solutions of SDEs and ODEs using a range of approximators $f^1,\dots,f^k$ to the drift $f$ with increasing accuracy and computational cost, only requiring a few evaluations of the most accurate $f^k$ and many evaluations of the less costly $f^1,\dots,f^{k-1}$. If the drift lies in the so-called Harder than Monte Carlo (HTMC) regime, i.e. it requires $ε^{-γ}$ compute to be $ε$-approximated for some $γ>2$, then ML-EM $ε$-approximates the solution of the SDE with $ε^{-γ}$ compute, improving over the traditional EM rate of $ε^{-γ-1}$. In other terms it allows us to solve the SDE at the same cost as a single evaluation of the drift. In the context of diffusion models, the different levels $f^{1},\dots,f^{k}$ are obtained by training UNets of increasing sizes, and ML-EM allows us to perform sampling with the equivalent of a single evaluation of the largest UNet. Our numerical experiments confirm our theory: we obtain up to fourfold speedups for image generation on the CelebA dataset downscaled to 64x64, where we measure a $γ\approx2.5$. Given that this is a polynomial speedup, we expect even stronger speedups in practical applications which involve orders of magnitude larger networks.