Why Struggle with Continuous Latents? Interpretable Discrete Latent Reasoning via Rendered Compression
2026-06-29 • Computation and Language
Computation and Language
AI summaryⓘ
The authors address the problem that current large language models need to produce very long answers to solve complex problems, which takes a lot of time. They introduce Discrete Latent Reasoning (DLR), which compresses the reasoning steps into shorter, discrete tokens by turning written reasoning chains into images and then representing these images as key tokens. This method makes reasoning faster and the reasoning steps easier to understand than previous approaches that used continuous (smooth) representations. Their tests show that DLR is much more efficient while maintaining clear, interpretable reasoning.
Large Language ModelsChain-of-Thought ReasoningLatent SpaceDiscrete TokensAutoregressive ModelingReinforcement LearningCompressionVisual Feature ExtractionSemantic StructureSymbolic Supervision
Authors
Shuochen Chang, Qingyang Liu, Shaobo Wang, Bingjie Gao, Qianli Ma, Haonan Zhao, Yibo Miao, Yulin Sun, Zelin Peng, Jiangtong Li, Li Niu
Abstract
Large language models achieve high reasoning performance via explicit chain-of-thought and reinforcement learning, but require long output sequences and extended inference time. Latent reasoning reduces this cost by shifting computation into a latent space; however, continuous latent methods are hard to train, suffering from unstable and uninterpretable reasoning trajectories. We argue these issues stem from a misalignment between continuous-space reasoning and discrete symbolic supervision, as continuous states lack explicit anchors for step-by-step alignment. To resolve this, we propose \textbf{Discrete Latent Reasoning~(DLR)}, the first method that converts continuous latent states into explicit discrete tokens. Inspired by render-based compression, we render textual chains of thought into images, extract visual features, and construct a discrete latent vocabulary via clustering-based fine-tuning. Expanding the vocabulary and output head enables standard autoregressive modeling over both natural language and latent tokens, supporting pretraining alignment, SFT, and RL. Experiments on five reasoning benchmarks and two model series~(Qwen3-VL and LLaMA-3) confirm that \textbf{DLR} outperforms prior latent reasoning baselines with up to \textbf{20$\times$ compression}. Furthermore, the learned latent trajectories retain an interpretable semantic structure. Overall, discrete latent tokens provide a controllable and interpretable basis for efficient latent reasoning.