Unified Energy for Invariant and Independent Decoding in Diffusion Language Models

2026-06-08Computation and Language

Computation and LanguageArtificial Intelligence
AI summary

The authors studied why Diffusion Language Models (DLMs), which generate text all at once instead of word by word, don't perform as well as traditional methods when generating text in parallel. They identified three main reasons: the model's capacity, how it handles dependencies between words, and how it deals with certain symmetries (invariances). To fix this, they created a new way to measure energy called unified energy (Uni-E) that captures all these factors and can be used with any model size. Their experiments show that Uni-E helps improve the DLMs by correcting some issues that cause errors in text generation.

Diffusion Language ModelsAuto-regressive decodingModel capacityToken dependencyInvarianceEnergy-based modelsUnified energy (Uni-E)Sampling estimatorDistribution shiftParallel text generation
Authors
Yuchen Yan, Minkai Xu, Zaiquan Yang, Yatao Bian
Abstract
Diffusion Language Models (DLMs) enable parallel text generation by iteratively denoising a full sequence, offering attractive flexibility compared to auto-regressive (AR) decoding. However, existing methods fail to fully capture token relationships, leading to a performance gap relative to AR baselines, especially as the degree of parallelism increases. In this paper, we give a systematic analysis of the gap, identifying three key factors: (i) model capacity, (ii) dependency, and (iii) invariance. To address these issues, we first propose an invariant energy (Inv-E) together with an effective sampling-based estimator to handle the invariance issue. By further combining with the independent energy (Ind-E), we obtain a unified energy (Uni-E), that accounts for all these factors. Uni-E enjoys a unique advantage: it can be computed exactly without sampling-based partition estimation. Besides, Uni-E is model agnostic and can therefore be scaled to models of arbitrary size. We further prove that Uni-E can correct the distribution shift caused by dependency and invariance. Extensive experiments across Diffusion Language Models (DLMs) and Diffusion Large Language Models (DLLMs) demonstrate the effectiveness of the proposed Uni-E.