Tensor Product Representation Probes Reveal Shared Structure Across Linear Directions

2026-05-11Machine Learning

Machine Learning
AI summary

The authors studied how a model understands the board game Othello by looking at its internal representations. They found that while some game information could be seen as simple directions (linear probes), there was more complex structure captured using tensor product representations (TPRs), which combine different types of information like piece color and position. Their approach showed that the simple directions could be derived from these more detailed TPRs, suggesting the model's knowledge is more structured than it first appears. This helps explain how models represent relationships, not just isolated features.

linear probestensor product representationsOthelloboard-state representationembeddingbinding matrixrepresentation learninggeometric structure
Authors
Andrew Lee, Fernanda Viégas, Martin Wattenberg
Abstract
While researchers are finding concepts represented as linear directions in language models, a bag of linear directions fails to capture relational structure. To better understand this dichotomy, we study a model with known linear representations, but trained in a highly structured domain -- the board game Othello. While the model's internal board-state representation is linearly decodable, we find additional structure in the form of tensor product representations (TPRs). We train TPR probes to recover shared structure amongst the linear probes, yielding a factorization into square-embeddings, color-embeddings, and a binding matrix that composes them to construct the model's board-state representation. We find geometric signatures within the weights of our TPR probe that align with the structure of the board, but perhaps more importantly, that the linear probes can be recovered directly from the parameters of our TPR probe. Our findings suggest that directional representations may be projections of more structured underlying representations.