The Truth Lies Somewhere in the Middle (of the Generated Tokens)

2026-05-11 • Machine Learning

Machine LearningComputation and Language

AI summaryⓘ

The authors looked at how to best combine hidden states produced one after another by a language model to get a good overall understanding of what the model "knows." They found that averaging (mean pooling) these hidden states from generated tokens gives a better semantic representation than just using any single token's state. This suggests that information is spread out across multiple generated tokens, not just stored in one spot. Also, the authors showed that representations from generated tokens are more meaningful than those from the initial prompt, and watching how these representations change helps understand how the model behaves.

autoregressive modelshidden statesmean poolingcausal maskingsemantic representationskernel alignmentlanguage modelsprompt tokens

Authors

Sophie L. Wang, Phillip Isola, Brian Cheung

Abstract

How should hidden states generated autoregressively be collapsed into a representation that reflects a language model's internal state? Despite tokens being generated under causal masking, we find that mean pooling across their hidden states yields more semantic representations than any individual token alone. We quantify this through kernel alignment to reference spaces in language, vision, and protein domains. The improvement through mean pooling is consistent with information being distributed across generated tokens rather than localized to a single position. Furthermore, representations derived from generated tokens outperform those from prompt tokens, and alignment across generation reveals interpretable dynamics in model behavior.

View PDFOpen arXiv