Statistical Inference for Stochastic Gradient Descent Beyond Finite Variance
2026-05-25 • Machine Learning
Machine Learning
AI summaryⓘ
The authors study a method for creating reliable confidence intervals when using stochastic gradient descent (SGD), even in cases where the noise in the data can be extremely large and unpredictable. They develop a technique that adjusts for heavy-tailed noise without needing to know complicated parameters about that noise. Their approach uses averages of SGD steps and a special normalization method to cancel out problematic scaling effects. Then, they use a subsampling method to find the right thresholds for confidence intervals. Simulations show their method works well in practice for estimating uncertainty in SGD.
Stochastic gradient descentConfidence regionsHeavy-tailed noisePolyak-Ruppert averagingWeak convergenceSubsamplingStable distributionsStochastic optimizationStatistical inferenceAsymptotic validity
Authors
Jose Blanchet, Peter Glynn, Wenhao Yang
Abstract
Stochastic gradient descent (SGD) is a foundational algorithm for large-scale statistical learning and stochastic optimization. However, statistical inference based on SGD iterates remains challenging when stochastic gradients have infinite variance, as the relevant limiting distributions depend on unknown nuisance parameters. In this paper, we develop an efficient, model-agnostic methodology for constructing confidence regions from SGD trajectories that applies in both finite- and infinite-variance regimes. The procedure is based on a joint weak convergence result for the Polyak-Ruppert averaged estimator and an empirical second-moment normalizer constructed from stochastic gradients along the SGD trajectory. This joint limit yields a self-normalized statistic in which the leading tail-dependent scaling terms cancel. We then use a subsampling calibration scheme to estimate the relevant critical values, avoiding explicit estimation of tail indices, slowly varying functions, or stable-law parameters. The resulting confidence regions are straightforward to implement and are asymptotically valid under both the finite- and infinite-second-moment regimes. Simulation studies show reliable coverage in various settings, supporting the proposed method as a practical tool for uncertainty quantification in stochastic optimization.