Trajectory Geometry of Transformer Representations Across Layers
2026-06-08 • Machine Learning
Machine Learning
AI summaryⓘ
The authors studied how information transforms as it moves through the layers of transformer models by viewing the process as a path through a high-dimensional space. They found that similar inputs lead to paths that come closer together mid-to-late in the network, indicating some form of stable understanding. More complex reasoning tasks caused these paths to bend more, suggesting curvature relates to difficulty. They also observed that ambiguous words cause paths to split apart, reflecting uncertainty. Finally, they discovered a common three-step pattern in how information changes across layers for different models.
Transformer modelsRepresentation manifoldTrajectory geometrySemantic convergenceCurvatureCosine similarityAttractor dynamicsMechanistic interpretabilityLayerwise representationModel-agnostic analysis
Authors
Vishal Pandey, Gopal Singh
Abstract
Understanding how transformer representations evolve across layers, not merely what they encode, remains an open problem in mechanistic interpretability. We recast the transformer forward pass as a discrete population trajectory through a high-dimensional representation manifold, drawing on geometric tools from computational neuroscience. Rather than probing for pre-specified features, we characterize trajectory geometry using five metrics computed directly in the ambient space: trajectory length, curvature, a semantic convergence index, layerwise cosine similarity, and representational stability. Across three model families (GPT-2, TinyLlama, Qwen2.5) and five controlled prompt families, we report four findings. First, semantically related prompts converge significantly in middle-to-late layers (peak CI 0.41--0.58, p<0.001, Mann-Whitney U), consistent with attractor-like dynamics. Second, reasoning tasks produce trajectories of greater curvature than lexical variations (0.71--0.83 rad vs. 0.27--0.31 rad), suggesting curvature encodes computational complexity. Third, ambiguous tokens exhibit trajectory bifurcation with up to 5.6x representational separation by the final layer, absent in unambiguous controls. Fourth, layerwise cosine similarity reveals a universal three-phase structure: encoding, elaboration, and output preparation, consistent across all three architectures. All four effects vanish under shuffled-layer and random-embedding controls. We release a fully open-source, model-agnostic pipeline and argue that trajectory geometry constitutes a principled, probe-free lens for mechanistic interpretability.