Adaptive Sharpness-Aware Minimization with a Polyak-type Step size: A Theory-Grounded Scheduler

2026-06-01 • Machine Learning

Machine Learning

AI summaryⓘ

The authors focus on improving Sharpness-Aware Minimization (SAM), a method that helps machine learning models generalize better by smoothing the loss landscape. They create new adaptive learning rate methods, called Polyak schedulers, specifically designed for SAM updates. Their math shows these methods converge efficiently in different settings, both deterministic and stochastic. Experiments show their approach performs as well or better than traditional SAM methods while needing less tuning of learning rates.

Sharpness-Aware Minimizationloss landscapePolyak step sizeadaptive learning ratestochastic gradient descentstrongly convexconvergence ratedeterministic optimizationstochastic optimization

Authors

Dimitris Oikonomou, Nicolas Loizou

Abstract

Sharpness-Aware Minimization (SAM) has established itself as a powerful and widely adopted optimizer for training machine learning models. By explicitly minimizing the sharpness of the loss landscape, SAM often improves generalization while delivering strong empirical performance. However, SAM and its variants, like most training algorithms, are sensitive to the choice of learning rate, which is typically selected through extensive hyperparameter tuning or predefined schedulers. In this work, motivated by recent advances on the effectiveness of stochastic Polyak step sizes for Stochastic Gradient Descent (SGD), we derive Polyak schedulers tailored to SAM-style updates, yielding novel adaptive algorithms in both deterministic and stochastic settings. In the smooth setting, we prove linear convergence for strongly convex objectives and an $\mathcal{O}(1/T)$ convergence rate for convex objectives in the deterministic case. In the stochastic setting, we establish analogous convergence guarantees up to a neighborhood of the optimum. Numerical experiments demonstrate that the proposed Polyak schedulers achieve performance comparable to or better than carefully tuned SAM baselines, while substantially reducing the need for learning-rate tuning.

View PDFOpen arXiv