Characterizing Optimizer-Dependent Training Dynamics Through Hessian Eigenvector Displacement and Localization
2026-06-29 • Machine Learning
Machine Learning
AI summaryⓘ
The authors studied how the main directions of curvature in a neural network's loss landscape (given by Hessian eigenvectors) change during training with different optimizers. They found that when using SGD, these directions become more stable over time, while with Adam, they change more frequently and tend to focus on a smaller number of parameters. This means that the way the model learns and changes depends a lot on the optimizer, and tracking these eigenvectors helps understand those differences.
Hessian matrixeigenvalueseigenvectorscurvatureneural networksoptimizerSGDAdamlocalizationinverse participation ratio
Authors
Marcelina Marjankowska, Valerio Modugno, Paolo Barucca
Abstract
Hessian spectral properties are a standard tool in analysing neural-network training, with eigenvalues linked to sharpness, generalization, and optimization dynamics. Eigenvalues quantify curvature magnitude, while eigenvectors identify which parameters generate that curvature. In this work, we study how the leading Hessian eigenvectors evolve during training and how they affect the learning trajectories. We track the training dynamics of multilayer perceptrons on a classification problem and measure eigenvector dynamics through two complementary statistics: (i) displacement over time, inspired by analyses of glassy systems, and (ii) localization via the inverse participation ratio. The metrics are compared against a random null model of the Hessian induced by the architecture. Our results reveal clear optimizer-dependent behaviour. SGD leads to progressively more stable leading curvature directions, while Adam exhibits substantially stronger reorganization of eigenvectors throughout training. We also observe a localization phenomenon under Adam, where a small subset of parameters contributes disproportionately to the leading curvature directions. These results suggest that Hessian eigenvector dynamics capture key differences in optimizer behaviour and the resulting training trajectories.