Geometric Latent Reasoning Induces Shorter Generations in LLMs

2026-06-01Computation and Language

Computation and Language
AI summary

The authors explore a new way for large language models to solve problems by moving through a continuous space rather than just using words step-by-step. They treat the reasoning process as a path in a geometric space of word embeddings, using a method called Geometric Latent Reasoning (GLR) to predict small moves in that space. This approach lets the models find answers with shorter generated texts because it replaces some explicit word-by-word reasoning with smooth latent steps. Their experiments show that this can make reasoning more efficient without losing accuracy. The work suggests a balance between how much hidden computation is used, how long the output is, and how correct the answers are.

large language modelslatent reasoningtoken embeddingsgeometric path-approximationchain-of-thoughtlatent spaceQwen3 modelsreasoning trajectoriesgeneration stepscontinuous embeddings
Authors
Shashi Kumar, Yacouba Kaloga, Petr Motlicek, Ina Kodrasi, Andrea Cavallaro
Abstract
Large language models solve complex problems by generating lengthy chains of explicit reasoning tokens. While effective, this makes reasoning expensive, length-sensitive, and constrained to (discrete) natural language. While latent reasoning offers a continuous alternative, determining useful structures for intermediate latent states is an open challenge. In this paper, we formulate latent reasoning as a geometric path-approximation problem within the model's pretrained token-embedding space. We introduce Geometric Latent Reasoning (GLR), which uses a lightweight transition head to predict iterative direction updates in embedding space. Using textual chain-of-thought traces as anchors, GLR learns to approximate discrete reasoning trajectories while permitting continuous deviations from exact token embeddings. Evaluations on mathematical reasoning benchmarks using Qwen3 models reveal an emergent phenomenon: geometric latent reasoning induces substantially shorter generations without an explicit length objective. By replacing early explicit reasoning with continuous latent steps, models often reach correct answers using substantially fewer total generation steps. These findings suggest that continuous trajectories act as compact intermediate reasoning states, exposing a new tradeoff between latent computation budget, output length, and accuracy.