Stochastic convergence of parallel asynchronous adaptive first-order methods

2026-06-01Artificial Intelligence

Artificial Intelligence
AI summary

The authors introduce a new group of optimization methods that work without needing all parts to be synchronized, making them faster for certain problems. These methods adapt as they learn, include options like momentum to speed up progress, and can handle some inaccuracies in calculations. They prove that these methods will still find good solutions even for complicated problems that aren't shaped simply, with performance improving as more steps are taken. Tests show these methods work well, especially in big and diverse machine learning setups where different parts run at different speeds.

asynchronous optimizationadaptive methodsmomentumnon-convex functionsstochastic convergencemachine learninglarge-scale systemsfirst-order optimizationnormalization
Authors
Serge Gratton, Philippe L. Toint
Abstract
A new class of asynchronous adaptive first-order optimization methods is introduced, comprising asynchronous variants of several popular algorithms. Versions of these methods using momentum and/or inexact normalization are also considered. The convergence of methods in the class on non-convex functions is analyzed in a fully stochastic setting, and is shown to be (up to logarithmic factors) of order O(1/sqrt{t}) under reasonable assumptions. Numerical experiments suggest that such asynchronous adaptive algorithms are very relevant in heterogeneous large-scale machine learning systems.