Latent Thought Flow: Efficient Latent Reasoning in Large Language Models

2026-06-15 • Artificial Intelligence

Artificial IntelligenceMachine Learning

AI summaryⓘ

The authors identify a problem with how large language models explain their reasoning step-by-step using words, which is slow. They propose a new way called Latent Thought Flow that lets models think in a hidden, continuous way rather than using words for every step. Their method learns to balance getting good answers quickly by sampling different thought paths. Tests show this approach gives better accuracy and faster reasoning compared to previous methods. This helps models think more efficiently without needing to explain everything in language at every step.

Large Language ModelsChain-of-ThoughtLatent ReasoningContinuous SpaceGFlowNetPosterior SamplingEntropySubtrajectory BalanceInference Efficiency

Authors

Xiandong Zou, Jing Huang, Jianshu Li, Pan Zhou

Abstract

Large Language Models (LLMs) increasingly rely on intermediate reasoning, yet explicit Chain-of-Thought (CoT) suffers from a linguistic space bottleneck: each thought must be decoded into tokens, causing high inference overhead. Latent reasoning moves deliberation into continuous space, but existing methods mostly learn deterministic or reward-maximizing paths, lacking a principled way to allocate probability across trajectories with different correctness and costs. We propose Latent Thought Flow (LTF), which models reasoning as variable-length continuous trajectories and trains a sampler to match a reward-induced posterior over answer quality and computation cost. We instantiate this with a continuous GFlowNet using stochastic latent transitions. To handle sparse answer supervision, we introduce an Entropy-Weighted Subtrajectory Balance objective for intermediate rewards and a reference-prior regularizer to anchor exploration. Experiments under finetuning and transfer learning settings show that LTF outperforms explicit CoT and latent reasoning baselines, improving accuracy by 9.5% while reducing reasoning length by 27.2% on average compared with strong latent reasoning baselines.

View PDFOpen arXiv