Accelerating Min-Max Optimization via Power-Law Stepsizes
2026-06-01 • Computer Science and Game Theory
Computer Science and Game TheoryMachine Learning
AI summaryⓘ
The authors revisit a method called Extragradient (EG) used for solving certain min-max problems. EG usually improves slowly at a rate proportional to 1 divided by the square root of the number of steps. They show that by cleverly changing the step sizes over time, EG can work faster, improving at a rate close to 1 divided by the number of steps. They further improve this by using different step sizes in two parts of the algorithm, nearly reaching the best possible speed. Their approach also applies to similar algorithms, suggesting it could help a wide range of min-max optimization problems.
Extragradient methodmin-max optimizationconvergence ratedynamic stepsizeslast-iterate convergencepower-law distributionOptimistic Gradientanchoringunconstrained optimization
Authors
Yue Wu, Weiqiang Zheng, Yang Cai, Haipeng Luo
Abstract
We revisit the convergence guarantees of the Extragradient (EG) method for unconstrained biaffine min-max optimization. It is known that EG with a fixed stepsize achieves a $Θ(T^{-1/2})$ last-iterate convergence rate, which is slower than the optimal $\mathcal{O}(T^{-1})$ rate attainable by incorporating additional mechanisms such as anchoring. Motivated by recent advances showing that dynamic stepsizes alone can significantly accelerate gradient descent, we ask whether dynamic stepsizes can similarly accelerate the last-iterate convergence of EG. We present the first positive result in this direction. Specifically, we provide a deterministic dynamic stepsize schedule that accelerates the convergence rate of EG to $\mathcal{O}(T^{-2/3+\varepsilon})$ for any $\varepsilon > 0$. We also show that this rate is tight when the extrapolation and update steps of EG use the same stepsize. We then show that allowing different stepsizes for the extrapolation and update steps further improves the convergence rate to the near-optimal $\mathcal{O}(T^{-1+\varepsilon})$. Our analysis reduces stepsize scheduling to an optimization problem, whose solution leads to a stepsize schedule that follows (a discretization of) a power-law distribution. Our proposed stepsize schedules and analysis extend to other methods, such as Optimistic Gradient (OG), and suggest broader applicability to general min-max optimization problems.