Who Should Lead Decoding Now? Tracking Reliable Trajectories for Ensembling Masked Diffusion Language Models
2026-06-15 • Computation and Language
Computation and LanguageArtificial Intelligence
AI summaryⓘ
The authors study a type of language model called Masked Diffusion Language Models (MDLMs) that generate text in a unique step-by-step way. They notice that good answers come from stable confidence during the generation process, and when a model gets uncertain, using parts of sequences from other models can help. Based on this, they introduce TIE, a method that lets multiple MDLMs share their best partial outputs as they generate text, improving overall results by combining their strengths. Their experiments show this approach works well on tasks requiring logical reasoning.
Masked Diffusion Language Modelssequence generationdecoding dynamicsconfidence dynamicsensemblingtrajectory-based methodsdenoisingiterative methodsknowledge fusion
Authors
Heecheol Yun, Joonhyung Park, Joowon Kim, Eunho Yang
Abstract
Masked Diffusion Language Models (MDLMs) have emerged as a distinct paradigm for sequence generation. As MDLMs become diverse in capabilities and knowledge coverage, an important question is how to combine their knowledge. Toward this, we first investigate the unique decoding dynamics of MDLMs. We find that successful generations exhibit stable confidence dynamics over answer-relevant positions, while unreliable trajectories can often be corrected by injecting promising intermediate states from other models. Guided by this observation, we propose $\textbf{TIE}$ ($\textbf{T}$rajectory-based $\textbf{I}$terative $\textbf{E}$nsembling), a knowledge fusion framework in which MDLMs iteratively identify reliable decoding trajectories and relay them across models. TIE tracks confidence dynamics over answer-relevant positions to determine which model currently follows a more reliable trajectory and selectively transfers partially denoised sequences across models. As the model on the more promising trajectory often changes across denoising steps, TIE allows different models to contribute complementary strengths at different stages of generation. Strong performance across diverse reasoning tasks, along with our analyses, suggests that TIE offers a practical approach to the underexplored problem of MDLM ensembling.