The Geometric Wall: Manifold Structure Predicts Layerwise Sparse Autoencoder Scaling Laws

2026-05-11Machine Learning

Machine LearningArtificial Intelligence
AI summary

The authors studied sparse autoencoders (SAEs), which try to represent neural network activations using simple building blocks assuming a flat, linear space. They found that this assumption fails because the activation space is curved and changes across different layers, causing reconstruction errors to vary in ways previous models couldn't explain. By analyzing many layers from two large models, the authors showed that the geometry of this curved space predicts how well SAEs perform at each layer, and that these findings transfer between models. This means that limitations in sparse autoencoders come from the shape of the activation space itself, not just from model size or resource limits.

Sparse autoencoderLinear representation hypothesisActivation manifoldIntrinsic dimensionCurvatureReconstruction errorScaling lawsResidual streamGemma modelWidth-sparsity scaling
Authors
Eslam Zaher, Maciej Trzaskowski, Quan Nguyen, Fred Roosta
Abstract
Sparse autoencoders (SAEs) operationalise the linear representation hypothesis: they reconstruct model activations as sparse linear combinations of interpretable dictionary atoms, on the implicit assumption that activation space is well approximated by a globally linear structure. Their reconstruction error varies sharply across layers in ways that existing scaling laws, fitted at single layers, do not explain. We argue that this variation is the empirical trace of a geometric mismatch: where the activation manifold is curved and its intrinsic dimension varies across layers, no sparse linear dictionary can match it uniformly, and the SAE's width-sparsity scaling becomes a layer-dependent function of manifold structure rather than a single universal law. We conduct the first cross-layer SAE scaling study, fitting and regressing on 844 residual-stream Gemma Scope SAE checkpoints across 68 layers of Gemma 2 2B and 9B. Stage 1 fits a per-layer scaling-law surface; Stage 2 regresses the fitted parameters and the derived per-layer width exponents on four layerwise geometric summaries. We find that manifold geometry predicts the per-layer width exponent in both models, and that the same regression coefficients learnt on one model predict the other model's per-layer exponents under cross-model transfer, indicating a transferable geometric law. At the showcase layers where richer width grids permit identification of the asymptotic floor, we find that the fitted floor tracks the layerwise geometric ordering: higher curvature and intrinsic dimension correspond to higher floor, consistent with the irreducible second-order residual that any sparse linear approximation of a curved manifold must leave behind. SAEs thus encounter not a finite-resource ceiling but a geometry-dependent wall, set by the manifold they are trying to reconstruct.