Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.09919
Cited By
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
14 March 2024
Aonan Zhang
Chong-Jun Wang
Yi Wang
Xuanyu Zhang
Yunfei Cheng
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Recurrent Drafter for Fast Speculative Decoding in Large Language Models"
5 / 5 papers shown
Title
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
93
6
0
31 Jan 2025
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Jacob K Christopher
Brian Bartoldson
Tal Ben-Nun
Michael Cardei
B. Kailkhura
Ferdinando Fioretto
DiffM
53
3
0
10 Aug 2024
Speculative Streaming: Fast LLM Inference without Auxiliary Models
Nikhil Bhendawade
Irina Belousova
Qichen Fu
Henry Mason
Mohammad Rastegari
Mahyar Najibi
LRM
34
27
0
16 Feb 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zack Ankner
Rishab Parthasarathy
Aniruddha Nrusimha
Christopher Rinard
Jonathan Ragan-Kelley
William Brandon
26
25
0
07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
127
139
0
03 Feb 2024
1