Discrete Diffusion in Large Language and Multimodal Models: A Survey

16 June 2025

Runpeng Yu

Main:22 Pages

9 Figures

Appendix:6 Pages

Abstract

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed.

View on arXiv

@article{yu2025_2506.13759,
  title={ Discrete Diffusion in Large Language and Multimodal Models: A Survey },
  author={ Runpeng Yu and Qi Li and Xinchao Wang },
  journal={arXiv preprint arXiv:2506.13759},
  year={ 2025 }
}

Comments on this paper