ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15781
  4. Cited By
dKV-Cache: The Cache for Diffusion Language Models

dKV-Cache: The Cache for Diffusion Language Models

21 May 2025
Xinyin Ma
Runpeng Yu
Gongfan Fang
Xinchao Wang
    DiffM
ArXivPDFHTML

Papers citing "dKV-Cache: The Cache for Diffusion Language Models"

49 / 49 papers shown
Title
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Muddit: Liberating Generation Beyond Text-to-Image with a Unified Discrete Diffusion Model
Qingyu Shi
Jinbin Bai
Zhuoran Zhao
Wenhao Chai
Kaidong Yu
...
Shuangyong Song
Yunhai Tong
Xiangtai Li
X. Li
Shuicheng Yan
47
0
0
29 May 2025
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
FasterCache: Training-Free Video Diffusion Model Acceleration with High Quality
Zhengyao Lv
Chenyang Si
Junhao Song
Zhenyu Yang
Ping Luo
Ziwei Liu
Kwan-Yee K. Wong
VGen
DiffM
109
12
0
13 Mar 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola
Aaron Gokaslan
Justin T Chiu
Zhihan Yang
Zhixuan Qi
Jiaqi Han
Subham Sekhar Sahoo
Volodymyr Kuleshov
DiffM
164
11
0
12 Mar 2025
Implicit Search via Discrete Diffusion: A Study on Chess
Implicit Search via Discrete Diffusion: A Study on Chess
Jiacheng Ye
Zhenyu Wu
Jiahui Gao
Zhiyong Wu
Xin Jiang
Zhiyu Li
Dianbo Sui
DiffM
69
4
0
27 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
152
38
0
14 Feb 2025
Theoretical Benefit and Limitation of Diffusion Language Model
Theoretical Benefit and Limitation of Diffusion Language Model
Guhao Feng
Yihan Geng
Jian Guan
Wei Wu
Liwei Wang
Di He
DiffM
103
1
0
13 Feb 2025
Energy-Based Diffusion Language Models for Text Generation
Energy-Based Diffusion Language Models for Text Generation
Minkai Xu
Tomas Geffner
Karsten Kreis
Weili Nie
Yilun Xu
J. Leskovec
Stefano Ermon
Arash Vahdat
DiffM
77
14
0
28 Oct 2024
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Scaling Diffusion Language Models via Adaptation from Autoregressive Models
Shansan Gong
Shivam Agarwal
Yizhe Zhang
Jiacheng Ye
Lin Zheng
...
Peilin Zhao
W. Bi
Jiawei Han
Hao Peng
Dianbo Sui
AI4CE
102
24
0
23 Oct 2024
Real-Time Video Generation with Pyramid Attention Broadcast
Real-Time Video Generation with Pyramid Attention Broadcast
Xuanlei Zhao
Xiaolong Jin
Kai Wang
Yang You
VGen
DiffM
93
37
0
22 Aug 2024
Promises, Outlooks and Challenges of Diffusion Language Modeling
Promises, Outlooks and Challenges of Diffusion Language Modeling
Justin Deschenaux
Çağlar Gülçehre
DiffM
67
3
0
17 Jun 2024
DiTFastAttn: Attention Compression for Diffusion Transformer Models
DiTFastAttn: Attention Compression for Diffusion Transformer Models
Zhihang Yuan
Pu Lu
Hanling Zhang
Xuefei Ning
Linfeng Zhang
Tianchen Zhao
Shengen Yan
Guohao Dai
Yu Wang
70
25
0
12 Jun 2024
Simple and Effective Masked Diffusion Language Models
Simple and Effective Masked Diffusion Language Models
Subham Sekhar Sahoo
Marianne Arriola
Yair Schiff
Aaron Gokaslan
Edgar Marroquin
Justin T Chiu
Alexander M. Rush
Volodymyr Kuleshov
DiffM
86
91
0
11 Jun 2024
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Learning-to-Cache: Accelerating Diffusion Transformer via Layer Caching
Xinyin Ma
Gongfan Fang
Michael Bi Mi
Xinchao Wang
78
34
0
03 Jun 2024
$Δ$-DiT: A Training-Free Acceleration Method Tailored for Diffusion
  Transformers
ΔΔΔ-DiT: A Training-Free Acceleration Method Tailored for Diffusion Transformers
Pengtao Chen
Mingzhu Shen
Peng Ye
Jianjian Cao
Chongjun Tu
C. Bouganis
Yiren Zhao
Tao Chen
81
36
0
03 Jun 2024
Hash3D: Training-free Acceleration for 3D Generation
Hash3D: Training-free Acceleration for 3D Generation
Xingyi Yang
Xinchao Wang
3DGS
59
12
0
09 Apr 2024
Faster Diffusion via Temporal Attention Decomposition
Faster Diffusion via Temporal Attention Decomposition
Haozhe Liu
Wentian Zhang
Jinheng Xie
Francesco Faccio
Mengmeng Xu
Tao Xiang
Mike Zheng Shou
Juan-Manuel Perez-Rua
Jürgen Schmidhuber
DiffM
112
21
0
03 Apr 2024
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Patrick Esser
Sumith Kulal
A. Blattmann
Rahim Entezari
Jonas Muller
...
Zion English
Kyle Lacey
Alex Goodwin
Yannik Marek
Robin Rombach
DiffM
212
1,244
0
05 Mar 2024
Fast Timing-Conditioned Latent Audio Diffusion
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
93
106
0
07 Feb 2024
Pard: Permutation-Invariant Autoregressive Diffusion for Graph
  Generation
Pard: Permutation-Invariant Autoregressive Diffusion for Graph Generation
Lingxiao Zhao
Xueying Ding
Leman Akoglu
DiffM
48
9
0
06 Feb 2024
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache
  Quantization
KVQuant: Towards 10 Million Context Length LLM Inference with KV Cache Quantization
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Michael W. Mahoney
Y. Shao
Kurt Keutzer
A. Gholami
MQ
59
194
0
31 Jan 2024
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized
  Large Language Model Serving
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving
Yinmin Zhong
Shengyu Liu
Junda Chen
Jianbo Hu
Yibo Zhu
Xuanzhe Liu
Xin Jin
Hao Zhang
62
185
0
18 Jan 2024
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Cache Me if You Can: Accelerating Diffusion Models through Block Caching
Felix Wimbauer
Bichen Wu
Edgar Schoenfeld
Xiaoliang Dai
Ji Hou
...
Jonas Kohler
Christian Rupprecht
Daniel Cremers
Peter Vajda
Jialiang Wang
DiffM
50
62
0
06 Dec 2023
DeepCache: Accelerating Diffusion Models for Free
DeepCache: Accelerating Diffusion Models for Free
Xinyin Ma
Gongfan Fang
Xinchao Wang
58
132
0
01 Dec 2023
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
GPQA: A Graduate-Level Google-Proof Q&A Benchmark
David Rein
Betty Li Hou
Asa Cooper Stickland
Jackson Petty
Richard Yuanzhe Pang
Julien Dirani
Julian Michael
Samuel R. Bowman
AI4MH
ELM
72
627
0
20 Nov 2023
Discrete Diffusion Modeling by Estimating the Ratios of the Data
  Distribution
Discrete Diffusion Modeling by Estimating the Ratios of the Data Distribution
Aaron Lou
Chenlin Meng
Stefano Ermon
DiffM
67
91
0
25 Oct 2023
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Model Tells You What to Discard: Adaptive KV Cache Compression for LLMs
Suyu Ge
Yunan Zhang
Liyuan Liu
Minjia Zhang
Jiawei Han
Jianfeng Gao
34
231
0
03 Oct 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
120
1,044
0
31 May 2023
Scissorhands: Exploiting the Persistence of Importance Hypothesis for
  LLM KV Cache Compression at Test Time
Scissorhands: Exploiting the Persistence of Importance Hypothesis for LLM KV Cache Compression at Test Time
Zichang Liu
Aditya Desai
Fangshuo Liao
Weitao Wang
Victor Xie
Zhaozhuo Xu
Anastasios Kyrillidis
Anshumali Shrivastava
57
218
0
26 May 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head
  Checkpoints
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
61
626
0
22 May 2023
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
AR-Diffusion: Auto-Regressive Diffusion Model for Text Generation
Tong Wu
Zhihao Fan
Xiao Liu
Yeyun Gong
Yelong Shen
...
Juntao Li
Zhongyu Wei
Jian Guo
Nan Duan
Weizhu Chen
VLM
99
58
0
16 May 2023
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
DINOISER: Diffused Conditional Sequence Learning by Manipulating Noises
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Mingxuan Wang
DiffM
68
48
0
20 Feb 2023
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
68
2,182
0
19 Dec 2022
DiffusionBERT: Improving Generative Masked Language Models with
  Diffusion Models
DiffusionBERT: Improving Generative Masked Language Models with Diffusion Models
Zhengfu He
Tianxiang Sun
Kuan-Chieh Wang
Xuanjing Huang
Xipeng Qiu
DiffM
VLM
55
126
0
28 Nov 2022
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for
  Text Generation and Modular Control
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Xiaochuang Han
Sachin Kumar
Yulia Tsvetkov
82
85
0
31 Oct 2022
Diffusion-LM Improves Controllable Text Generation
Diffusion-LM Improves Controllable Text Generation
Xiang Lisa Li
John Thickstun
Ishaan Gulrajani
Percy Liang
Tatsunori B. Hashimoto
AI4CE
216
802
0
27 May 2022
MaskGIT: Masked Generative Image Transformer
MaskGIT: Masked Generative Image Transformer
Huiwen Chang
Han Zhang
Lu Jiang
Ce Liu
William T. Freeman
ViT
92
664
0
08 Feb 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
209
4,175
0
27 Oct 2021
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
135
1,893
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
175
5,328
0
07 Jul 2021
Structured Denoising Diffusion Models in Discrete State-Spaces
Structured Denoising Diffusion Models in Discrete State-Spaces
Jacob Austin
Daniel D. Johnson
Jonathan Ho
Daniel Tarlow
Rianne van den Berg
DiffM
109
900
0
07 Jul 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
150
2,307
0
20 Apr 2021
Argmax Flows and Multinomial Diffusion: Learning Categorical
  Distributions
Argmax Flows and Multinomial Diffusion: Learning Categorical Distributions
Emiel Hoogeboom
Didrik Nielsen
P. Jaini
Patrick Forré
Max Welling
DiffM
273
414
0
10 Feb 2021
Measuring Massive Multitask Language Understanding
Measuring Massive Multitask Language Understanding
Dan Hendrycks
Collin Burns
Steven Basart
Andy Zou
Mantas Mazeika
D. Song
Jacob Steinhardt
ELM
RALM
146
4,222
0
07 Sep 2020
Denoising Diffusion Probabilistic Models
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
375
17,550
0
19 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
542
41,106
0
28 May 2020
Generative Modeling by Estimating Gradients of the Data Distribution
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDa
DiffM
191
3,803
0
12 Jul 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
156
3,714
0
09 Jan 2019
Non-Autoregressive Neural Machine Translation
Non-Autoregressive Neural Machine Translation
Jiatao Gu
James Bradbury
Caiming Xiong
Victor O.K. Li
R. Socher
84
793
0
07 Nov 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
514
129,831
0
12 Jun 2017
1