Compressed Computation is (probably) not Computation in Superposition

2026-06-12Machine Learning

Machine Learning
AI summary

The authors examine whether the Compressed Computation (CC) model truly performs multiple computations at once (in superposition). They find that the model’s improved performance comes from an unintended mixing of inputs and labels rather than genuine parallel computation. When this mixing is removed, the performance gains disappear, and the neuron directions focus on parts related to this mixing. A simpler method based on this mixing can mimic the model’s loss pattern but doesn’t fully match its results. Therefore, the authors conclude that the CC model does not accurately represent computation in superposition.

Compressed ComputationComputation in SuperpositionReLU FunctionMixing MatrixResidual StreamEigenvaluesSemi-Non-Negative Matrix FactorizationNeural NetworksLoss FunctionNeuron Directions
Authors
Jai Bhagat, Sara Molas-Medina, Giorgi Giglemiani, Stefan Heimersheim
Abstract
We study whether the Compressed Computation (CC) toy model (Braun et al., 2025) is an instance of computation in superposition. The CC model appears to compute 100 ReLU functions with just 50 neurons, achieving a better loss than expected from only representing 50 ReLU functions. We show that the model mixes inputs via its noisy residual stream, corresponding to an unintended mixing matrix in the labels. Splitting the training objective into the ReLU term and the mixing term, we find that performance gains scale with the magnitude of the mixing matrix and vanish when the matrix is removed. The learned neuron directions concentrate in the subspace associated with the top 50 eigenvalues of the mixing matrix, suggesting that the mixing term governs the solution. Finally, a semi-non-negative matrix factorization (SNMF) baseline derived solely from the mixing matrix reproduces the qualitative loss profile and improves on prior baselines, though it does not match the trained model. These results suggest CC is not a suitable toy model of computation in superposition.