Beyond Single Tokens: Distilling Discrete Diffusion Models via Discrete MMD

2026-03-20Machine Learning

Machine LearningComputer Vision and Pattern Recognition
AI summary

The authors address the challenge of simplifying discrete diffusion models, which are used for generating data like text and images. They introduce a new method called Discrete Moment Matching Distillation (D-MMD), inspired by techniques from continuous diffusion models. Unlike previous methods that often lose quality, D-MMD keeps output high-quality and diverse when enough sampling steps are used. The authors show that their distilled models can even perform better than the original ones on various datasets.

Discrete diffusion modelsContinuous diffusion modelsDistillationSampling stepsMoment matchingModel compressionText generationImage generationGenerative models
Authors
Emiel Hoogeboom, David Ruhe, Jonathan Heek, Thomas Mensink, Tim Salimans
Abstract
It is currently difficult to distill discrete diffusion models. In contrast, continuous diffusion literature has many distillation approaches methods that can reduce sampling steps to a handful. Our method, Discrete Moment Matching Distillation (D-MMD), leverages ideas that have been highly successful in the continuous domain. Whereas previous discrete distillation methods collapse, D-MMD maintains high quality and diversity (given sufficient sampling steps). This is demonstrated on both text and image datasets. Moreover, the newly distilled generators can outperform their teachers.