Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.06706
Cited By
Multi-Candidate Speculative Decoding
12 January 2024
Sen Yang
Shujian Huang
Xinyu Dai
Jiajun Chen
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multi-Candidate Speculative Decoding"
14 / 14 papers shown
Title
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
75
1
0
03 May 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
48
0
0
13 Mar 2025
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
54
2
0
07 Mar 2025
Towards Optimal Multi-draft Speculative Decoding
Zhibo Hu
Tong Zheng
Vignesh Viswanathan
Ziyi Chen
Ryan Rossi
Yihan Wu
Dinesh Manocha
Heng Huang
47
4
0
26 Feb 2025
A Theoretical Perspective for Speculative Decoding Algorithm
Ming Yin
Minshuo Chen
Kaixuan Huang
Mengdi Wang
37
5
0
30 Oct 2024
Improving Multi-candidate Speculative Decoding
Xiaofan Lu
Yixiao Zeng
Feiyang Ma
Zixu Yu
Marco Levorato
39
0
0
16 Sep 2024
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi
Taehyeon Kim
Hongseok Jeung
Du-Seong Chang
Se-Young Yun
48
4
0
24 Jun 2024
Optimized Speculative Sampling for GPU Hardware Accelerators
Dominik Wagner
Seanie Lee
Ilja Baumann
Philipp Seeberger
Korbinian Riedhammer
Tobias Bocklet
48
3
0
16 Jun 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
61
20
0
30 May 2024
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Hao Mark Chen
Wayne Luk
Ka-Fai Cedric Yiu
Rui Li
Konstantin Mishchenko
Stylianos I. Venieris
Hongxiang Fan
49
7
0
28 May 2024
Accelerating Production LLMs with Combined Token/Embedding Speculators
Davis Wertheimer
Joshua Rosenkranz
Thomas Parnell
Sahil Suneja
Pavithra Ranganathan
R. Ganti
Mudhakar Srivatsa
48
4
0
29 Apr 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
51
24
0
29 Apr 2024
Online Speculative Decoding
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
29
53
0
11 Oct 2023
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
91
154
0
17 Sep 2021
1