Bregman meets Lévy: Stochastic mirror descent with heavy-tailed noise in continuous and discrete time
2026-06-02 • Machine Learning
Machine Learning
AI summaryⓘ
The authors examine how stochastic mirror descent (SMD), an optimization method, behaves when the randomness in the process has very large, unpredictable spikes (heavy-tailed noise). They model SMD as a continuous process influenced by Lévy noise, a type of noise with jumps, to better understand its behavior under these conditions. Despite these jumps causing infinite variance, the authors prove that the method still converges to a good solution, though the speed depends on how heavy the noise is. Their results clarify the effect of big random jumps on convergence and apply to discrete versions of SMD as well.
Stochastic Mirror DescentHeavy-tailed noiseLévy processStochastic differential equationInfinite varianceConvex optimizationStrong convexityScaling limitJump discontinuitiesConvergence rate
Authors
Pierre-Louis Cauvin, Panayotis Mertikopoulos
Abstract
We study the robustness of stochastic mirror descent (SMD) under heavy-tailed noise, focusing on whether the method retains its convergence guarantees when run with infinite-variance stochastic gradient input. To address this question in a principled manner, we begin by introducing a continuous-time model of SMD as a stochastic differential equation (SDE) driven by a centered Lévy noise process with finite $p$-th order moments, $1 < p \leq 2$. This scheme -- which we call the Lévy mirror flow (LMF) -- arises naturally as the scaling limit of SMD in the presence of heavy-tailed noise. In particular, when $p < 2$ -- the heavy noise regime -- the trajectories of LMF generically exhibit jump discontinuities of arbitrary magnitude which, if frequent enough, lead to infinite variance. Nonetheless, despite this highly singular behavior, we show that LMF attains $ε$-optimality within $\mathcal{O}(ε^{-p/(p-1)})$ time in the convex case, and within $\mathcal{\tilde O}(ε^{-1/(p-1)})$ time for (relatively) strongly convex objectives. These guarantees provide a transparent characterization of the impact of frequent long jumps on the convergence of the process, and percolate to a series of matching discrete-time guarantees for several variants of SMD under heavy-tailed noise.