Language Models Compare Quantities Using Number-specific and Unit-specific Heuristics

2026-06-02 • Computation and Language

Computation and Language

AI summaryⓘ

The authors studied how language models compare numbers with units, like 110 cm versus 1.2 m. They found that the models often make mistakes when the values are very close, especially near the decision boundary where small differences matter. Instead of converting all measurements to a common scale before comparing, the models seem to use simple rules based on the numbers and units separately. The authors also showed that changing these number and unit features can predict and alter the model's decisions.

language modelsmeasurement unitsnumerical comparisonunit conversiondecision boundaryheuristicscausal interventionlinear surrogate modelnumerical differenceunit scale difference

Authors

Mutsumi Sasaki, Go kamoda, Ryosuke Takahashi, Kosuke Sato, Kentaro Inui, Keisuke Sakaguchi, Benjamin Heinzerling

Abstract

Quantities with measurement units, such as 110 cm and 1.2 m, require language models (LMs) to combine a numeral with a symbolic unit scale. Here, we study how LMs compare such quantities in controlled settings spanning several unit systems. We find that accuracy degrades near the comparison boundary, where small changes in value determine the correct answer. The resulting errors are systematic: linear surrogate models predict LM preferences from numerical-difference and unit-scale-difference cues, and causal interventions on subspaces aligned with these variables shift model's output. The results suggest that LMs compare quantities through a bag of heuristics over numerals and units, rather than first converting both expressions to an exact shared-scale representation.

View PDFOpen arXiv