FOAM: Frequency and Operator Error-Based Adaptive Damping Method for Reducing Staleness-Oriented Error for Shampoo

2026-06-01Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors study Shampoo, a method used to speed up training of large machine learning models, which slows down because it needs to invert big matrices often. To save time, people update these calculations less frequently, but this makes the training less accurate and unstable. The authors analyze this trade-off and find that adding a technique called damping helps keep training stable. They then create FOAM, a new method that adjusts damping and how often these updates happen, making training faster without losing quality.

Shampoo optimizermatrix inversionpreconditionerdampingstalenessconvergencenumerical stabilityeigendecompositionadaptive algorithmslarge-scale optimization
Authors
Kyunghun Nam, Sumyeong Ahn
Abstract
Shampoo is attracting considerable attention for its superior performance on large-scale optimization benchmarks; yet it faces a significant practical bottleneck: the prohibitive computational overhead of matrix inversion. To mitigate this, practitioners typically rely on stale preconditioner updates, creating a fundamental trade-off between computational efficiency and optimization fidelity. In this work, we provide a theoretical study of staleness through the complementary lenses of convergence and stability. While staleness improves computational efficiency, it inherently degrades performance and introduces numerical instability. Crucially, we identify that damping, acting as a numerical stabilizer, can effectively suppress these negative effects. Guided by this analysis, we propose FOAM, an adaptive algorithm that stabilizes training by dynamically controlling both the damping factor and the eigendecomposition frequency based on an approximation of the staleness-oriented error. Experimental results demonstrate that FOAM reduces wall-clock time compared to standard Shampoo while maintaining robust convergence.