13
0

Discrete Diffusion in Large Language and Multimodal Models: A Survey

Main:22 Pages
9 Figures
Appendix:6 Pages
Abstract

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed.

View on arXiv
@article{yu2025_2506.13759,
  title={ Discrete Diffusion in Large Language and Multimodal Models: A Survey },
  author={ Runpeng Yu and Qi Li and Xinchao Wang },
  journal={arXiv preprint arXiv:2506.13759},
  year={ 2025 }
}
Comments on this paper