Follow the Latent Roadmap: Navigating Revocable Decoding for Diffusion LLMs with Anchor Tokens

2026-06-15 • Computation and Language

Computation and LanguageArtificial Intelligence

AI summaryⓘ

The authors study a type of language model called diffusion LLMs that generate text in parallel, which can be fast but sometimes makes mistakes that get worse over time. They identify two main problems: errors spreading from bad context and mistakes reinforcing each other, making them hard to fix. To address this, they propose ASRD, a method that separates reliable parts of the generated text (anchors) from uncertain parts to better guide the model’s corrections. Their approach improves accuracy and speeds up text generation on tasks like math and coding without needing extra training.

Diffusion Large Language ModelsParallel GenerationDecoding SpeedRevocable DecodingError PropagationAnchor TokensTemporal ConsistencyEntropy-weighted SignalsOrthogonal PerturbationsInference Throughput

Authors

Yizhen Yao, Qinglin Zhu, Runcong Zhao, Xiangxiang Dai, Yanzheng Xiang, Yulan He, Lin Gui

Abstract

Diffusion Large Language Models (dLLMs) offer a promising avenue for parallel generation but face a trade-off between decoding speed and quality. While revocable decoding strategies attempt to mitigate errors by verifying and remasking tokens, they typically operate within a mixed-quality context. This leads to two critical failures: \textit{Error Propagation}, where new tokens absorb toxic information from erroneous context, and \textit{Local Error Reinforcement}, where errors mutually reinforce each other to evade detection. To alleviate these challenges, we propose ASRD (Anchor Supervised Revocable Decoding), a training-free framework that operates within the embedding space. ASRD explicitly decouples the decoding context into trusted \textit{Anchor Tokens}, which are identified via temporal consistency, and uncertain candidates. Leveraging a dynamic Anchor Tokens Cache, we introduce two complementary mechanisms: (1) Anchor-Guided Generation, which injects entropy-weighted anchor signals into masked positions to implicitly rectify attention toward the reliable global skeleton; and (2) Anchor-Perturbed Verification, which applies orthogonal perturbations to uncertain candidate tokens, destabilizing and remasking errors driven by fragile local consensus. Extensive experiments on math and coding benchmarks demonstrate that ASRD outperforms recent remasking baselines, achieving accuracy improvements of up to 6.4\% while accelerating inference throughput by up to 7.2$\times$.

View PDFOpen arXiv