Optimizer-Induced Mode Connectivity: From AdamW to Muon
2026-05-11 • Artificial Intelligence
Artificial IntelligenceMachine Learning
AI summaryⓘ
The authors studied how different optimization methods affect the way neural network solutions are connected. They found that for certain networks, solutions found by the same optimizer form a connected group when the network is large enough. Different optimizers can lead to solutions that are either overlapping or separated by gaps depending on factors like regularization. Experiments with GPT-2 models showed that paths connecting models trained by the same optimizer keep similar characteristics, while paths connecting different optimizers show smooth changes. This work highlights that the choice of optimizer influences the structure of solutions beyond what was known before.
mode connectivityoptimizerimplicit regularizationReLU networksAdamWMuon optimizerloss landscaperegularizationGPT-2neural network training
Authors
Fangzhao Zhang, Sungyoon Kim, Erica Zhang, Yiqi Jiang, Mert Pilanci
Abstract
Mode connectivity has been widely studied, yet the role of the optimizer remains underexplored. We revisit it through optimizer-induced implicit regularization, asking how connectivity behaves when restricted to solutions constrained by a given optimizer. For two-layer ReLU networks, we show that solutions from a single optimizer -- AdamW, Muon, or others in the Lion-$\mathcal{K}$ family -- form a connected set at sufficiently large width, a result not implied by prior work. We then characterize how optimizer-induced regions interact: at large width two different regions can be disjoint or overlap depending on regularization, while in our small-width example AdamW and Muon converge to disconnected zero-loss components separated by a provable loss barrier. Empirically, in GPT-2 pretraining, we observe same-optimizer paths preserve each model's spectrum while cross-optimizer paths traverse a smooth transition. Our results reveal optimizer-dependent structure beyond classical mode connectivity literature.