Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.01318
Cited By
Accelerating Large Language Model Decoding with Speculative Sampling
2 February 2023
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Large Language Model Decoding with Speculative Sampling"
16 / 316 papers shown
Title
GLIMMER: generalized late-interaction memory reranker
Michiel de Jong
Yury Zemlyanskiy
Nicholas FitzGerald
Sumit Sanghai
William W. Cohen
Joshua Ainslie
RALM
51
4
0
17 Jun 2023
SqueezeLLM: Dense-and-Sparse Quantization
Sehoon Kim
Coleman Hooper
A. Gholami
Zhen Dong
Xiuyu Li
Sheng Shen
Michael W. Mahoney
Kurt Keutzer
MQ
36
168
0
13 Jun 2023
On Optimal Caching and Model Multiplexing for Large Model Inference
Banghua Zhu
Ying Sheng
Lianmin Zheng
Clark W. Barrett
Michael I. Jordan
Jiantao Jiao
33
18
0
03 Jun 2023
Large Language Models as Tool Makers
Tianle Cai
Xuezhi Wang
Tengyu Ma
Xinyun Chen
Denny Zhou
LLMAG
37
192
0
26 May 2023
Parallel Sampling of Diffusion Models
Andy Shih
Suneel Belkhale
Stefano Ermon
Dorsa Sadigh
Nima Anari
DiffM
36
51
0
25 May 2023
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
Joshua Ainslie
James Lee-Thorp
Michiel de Jong
Yury Zemlyanskiy
Federico Lebrón
Sumit Sanghai
40
593
0
22 May 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
44
80
0
17 May 2023
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
65
122
0
16 May 2023
Inference with Reference: Lossless Acceleration of Large Language Models
Nan Yang
Tao Ge
Liang Wang
Binxing Jiao
Daxin Jiang
Linjun Yang
Rangan Majumder
Furu Wei
23
54
0
10 Apr 2023
Jump to Conclusions: Short-Cutting Transformers With Linear Transformations
Alexander Yom Din
Taelin Karidi
Leshem Choshen
Mor Geva
17
58
0
16 Mar 2023
Full Stack Optimization of Transformer Inference: a Survey
Sehoon Kim
Coleman Hooper
Thanakul Wattanawong
Minwoo Kang
Ruohan Yan
...
Qijing Huang
Kurt Keutzer
Michael W. Mahoney
Y. Shao
A. Gholami
MQ
36
102
0
27 Feb 2023
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
38
99
0
15 Feb 2023
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
49
636
0
30 Nov 2022
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Tao Ge
Heming Xia
Xin Sun
Si-Qing Chen
Furu Wei
85
18
0
20 May 2022
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
27
71
0
30 Mar 2022
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
Previous
1
2
3
4
5
6
7