Flag Varieties: A Geometric Framework for Deep Network Alignment

2026-05-11Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors study how layers in deep neural networks tend to align their weight matrices in certain geometric ways, which affects how information flows and how different networks behave. They use advanced math (geometric invariant theory) to show that this alignment follows precise structures rather than being random. Their work explains why certain patterns, like Neural Collapse, appear naturally and how regularization and nonlinearities influence alignment. They also provide new tools to analyze these patterns directly from network weights without needing to run data through the network. Their experiments on various architectures support their theoretical insights.

weight matrix alignmentneural collapsegeometric invariant theorysubspace intersectionflag varietyridge regularizationnonlinear activationcommutator obstructiondeep learningrepresentation similarity
Authors
Jingchuan Xiao, Xinyi Sui, Cihan Ruan
Abstract
Alignment, the tendency of adjacent weight matrices in deep networks to develop compatible subspace orientations, underlies gradient flow, Neural Collapse, and representation similarity across architectures. Despite extensive empirical documentation, these phenomena have resisted unified theoretical treatment: existing explanations are post-hoc, each fitted to a specific observation with whatever mathematics is at hand. We reverse this direction by deriving the mathematical structure that layerwise alignment inherently demands. Using geometric invariant theory, we prove that alignment geometry has a canonical closed, polystable stratum given by a flag variety, and that subspace intersection dimension is its unique reparameterization-invariant observable, establishing that subspace metrics are not empirical conventions but mathematical necessities. This unified framework yields two dynamical consequences: ridge regularization drives subspace alignment at an exponential rate set by weight decay, whereas nonlinear activations induce a commutator obstruction to exact basis alignment, generically present in nonlinear networks and absent in linear ones. Together these give a geometric explanation of the Level-2/3 hierarchy in Neural Collapse from first principles rather than post-hoc analysis. The commutator magnitude and head subspace overlap further serve as weight-space windows into internal alignment structure, requiring no forward passes. Experiments on multilayer perceptrons, residual networks, and pretrained language models support the proposed diagnostics and delineate their scope.