Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.19124
Cited By
Accelerating Production LLMs with Combined Token/Embedding Speculators
29 April 2024
Davis Wertheimer
Joshua Rosenkranz
Thomas Parnell
Sahil Suneja
Pavithra Ranganathan
R. Ganti
M. Srivatsa
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Production LLMs with Combined Token/Embedding Speculators"
4 / 4 papers shown
Title
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
95
6
0
31 Jan 2025
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
Hydra: Sequentially-Dependent Draft Heads for Medusa Decoding
Zack Ankner
Rishab Parthasarathy
Aniruddha Nrusimha
Christopher Rinard
Jonathan Ragan-Kelley
William Brandon
29
25
0
07 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
130
141
0
03 Feb 2024
1