Discrete Diffusion in Large Language and Multimodal Models: A Survey
- DiffMAI4CE

In this work, we provide a systematic survey of Discrete Diffusion Language Models (dLLMs) and Discrete Diffusion Multimodal Language Models (dMLLMs). Unlike autoregressive (AR) models, dLLMs and dMLLMs adopt a multi-token, parallel decoding paradigm using full attention and a denoising-based generation strategy. This paradigm naturally enables parallel generation, fine-grained output controllability, and dynamic, response-aware perception. These capabilities are previously difficult to achieve with AR models. Recently, a growing number of industrial-scale proprietary d(M)LLMs, as well as a large number of open-source academic d(M)LLMs, have demonstrated performance comparable to their autoregressive counterparts, while achieving up to 10x acceleration in inference speed.
View on arXiv@article{yu2025_2506.13759, title={ Discrete Diffusion in Large Language and Multimodal Models: A Survey }, author={ Runpeng Yu and Qi Li and Xinchao Wang }, journal={arXiv preprint arXiv:2506.13759}, year={ 2025 } }