Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.01318
Cited By
Accelerating Large Language Model Decoding with Speculative Sampling
2 February 2023
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accelerating Large Language Model Decoding with Speculative Sampling"
50 / 316 papers shown
Title
Weak-to-Strong Jailbreaking on Large Language Models
Xuandong Zhao
Xianjun Yang
Tianyu Pang
Chao Du
Lei Li
Yu-Xiang Wang
William Y. Wang
34
55
0
30 Jan 2024
Infini-gram: Scaling Unbounded n-gram Language Models to a Trillion Tokens
Jiacheng Liu
Sewon Min
Luke Zettlemoyer
Yejin Choi
Hannaneh Hajishirzi
51
52
0
30 Jan 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
52
128
0
26 Jan 2024
Accelerating Retrieval-Augmented Language Model Serving with Speculation
Zhihao Zhang
Alan Zhu
Lijie Yang
Yihua Xu
Lanting Li
P. Phothilimthana
Zhihao Jia
RALM
KELM
56
16
0
25 Jan 2024
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
44
37
0
24 Jan 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Feng-Huei Lin
Hanling Yi
Hongbin Li
Yifan Yang
Xiaotian Yu
Guangming Lu
Rong Xiao
41
3
0
23 Jan 2024
Spotting LLMs With Binoculars: Zero-Shot Detection of Machine-Generated Text
Abhimanyu Hans
Avi Schwarzschild
Valeriia Cherepanova
Hamid Kazemi
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
DeLMO
44
87
0
22 Jan 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
60
257
0
19 Jan 2024
A Survey on Hardware Accelerators for Large Language Models
C. Kachris
33
14
0
18 Jan 2024
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Yao Lu
Song Bian
Lequn Chen
Yongjun He
Yulong Hui
...
Huanchen Zhang
Minjia Zhang
Qizhen Zhang
Tianyi Zhou
Danyang Zhuo
37
7
0
17 Jan 2024
Inferflow: an Efficient and Highly Configurable Inference Engine for Large Language Models
Shuming Shi
Enbo Zhao
Deng Cai
Leyang Cui
Xinting Huang
Huayang Li
36
3
0
16 Jan 2024
JumpCoder: Go Beyond Autoregressive Coder via Online Modification
Mouxiang Chen
Hao Tian
Zhongxi Liu
Xiaoxue Ren
Jianling Sun
SyDa
KELM
43
2
0
15 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Yongqi Li
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
38
105
0
15 Jan 2024
Multi-Candidate Speculative Decoding
Sen Yang
Shujian Huang
Xinyu Dai
Jiajun Chen
BDL
28
16
0
12 Jan 2024
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Hongyi Jin
Tianqi Chen
Zhihao Jia
73
77
0
23 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference
Hongzheng Chen
Jiahao Zhang
Yixiao Du
Shaojie Xiang
Zichao Yue
Niansong Zhang
Yaohui Cai
Zhiru Zhang
65
35
0
23 Dec 2023
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
70
12
0
20 Dec 2023
An Adaptive Placement and Parallelism Framework for Accelerating RLHF Training
Youshao Xiao
Weichang Wu
Zhenglei Zhou
Fagui Mao
Shangchun Zhao
Lin Ju
Lei Liang
Xiaolu Zhang
Jun Zhou
34
5
0
19 Dec 2023
Cascade Speculative Drafting for Even Faster LLM Inference
Ziyi Chen
Xiaocong Yang
Jiacheng Lin
Chenkai Sun
Kevin Chen-Chuan Chang
Jie Huang
LRM
24
48
0
18 Dec 2023
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Yixin Song
Zeyu Mi
Haotong Xie
Haibo Chen
BDL
125
122
0
16 Dec 2023
Stateful Large Language Model Serving with Pensieve
Lingfan Yu
Jinyang Li
RALM
KELM
LLMAG
44
12
0
09 Dec 2023
An LLM Compiler for Parallel Function Calling
Sehoon Kim
Suhong Moon
Ryan Tabrizi
Nicholas Lee
Michael W. Mahoney
Kurt Keutzer
A. Gholami
LRM
24
60
0
07 Dec 2023
ChatGPT's One-year Anniversary: Are Open-Source Large Language Models Catching up?
Hailin Chen
Fangkai Jiao
Xingxuan Li
Chengwei Qin
Mathieu Ravaut
Ruochen Zhao
Caiming Xiong
Chenyu You
ELM
CLL
AI4MH
LRM
ALM
85
27
0
28 Nov 2023
Controlled Text Generation via Language Model Arithmetic
Jasper Dekoninck
Marc Fischer
Luca Beurer-Kellner
Martin Vechev
33
36
0
24 Nov 2023
PaSS: Parallel Speculative Sampling
Giovanni Monea
Armand Joulin
Edouard Grave
MoE
24
32
0
22 Nov 2023
Speculative Contrastive Decoding
Hongyi Yuan
Keming Lu
Fei Huang
Zheng Yuan
Chang Zhou
47
5
0
15 Nov 2023
Fast Chain-of-Thought: A Glance of Future from Parallel Decoding Leads to Answers Faster
Hongxuan Zhang
Zhining Liu
Yao Zhao
Jiaqi Zheng
Chenyi Zhuang
Jinjie Gu
Guihai Chen
LRM
MLLM
25
1
0
14 Nov 2023
REST: Retrieval-Based Speculative Decoding
Zhenyu He
Zexuan Zhong
Tianle Cai
Jason D. Lee
Di He
RALM
28
80
0
14 Nov 2023
Leveraging Speculative Sampling and KV-Cache Optimizations Together for Generative AI using OpenVINO
Haim Barad
Ekaterina Aidova
Yury Gorbachev
SyDa
9
1
0
08 Nov 2023
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
35
11
0
06 Nov 2023
The Synergy of Speculative Decoding and Batching in Serving Large Language Models
Qidong Su
Christina Giannoula
Gennady Pekhimenko
19
10
0
28 Oct 2023
Controlled Decoding from Language Models
Sidharth Mudgal
Jong Lee
H. Ganapathy
Yaguang Li
Tao Wang
...
Michael Collins
Trevor Strohman
Jilin Chen
Alex Beutel
Ahmad Beirami
39
70
0
25 Oct 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
Ziteng Sun
A. Suresh
Jae Hun Ro
Ahmad Beirami
Himanshu Jain
Felix X. Yu
53
68
0
23 Oct 2023
An Emulator for Fine-Tuning Large Language Models using Small Language Models
Eric Mitchell
Rafael Rafailov
Archit Sharma
Chelsea Finn
Christopher D. Manning
ALM
41
53
0
19 Oct 2023
SPEED: Speculative Pipelined Execution for Efficient Decoding
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Hasan Genç
Kurt Keutzer
A. Gholami
Y. Shao
32
35
0
18 Oct 2023
FiLM: Fill-in Language Models for Any-Order Generation
Tianxiao Shen
Hao-Chun Peng
Ruoqi Shen
Yao Fu
Zaïd Harchaoui
Yejin Choi
46
8
0
15 Oct 2023
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Mengkang Hu
Yao Mu
Xinmiao Yu
Mingyu Ding
Shiguang Wu
Wenqi Shao
Qiguang Chen
Bin Wang
Yu Qiao
Ping Luo
LLMAG
55
34
0
12 Oct 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
55
84
0
12 Oct 2023
MatFormer: Nested Transformer for Elastic Inference
Devvrit
Sneha Kudugunta
Aditya Kusupati
Tim Dettmers
Kaifeng Chen
...
Yulia Tsvetkov
Hannaneh Hajishirzi
Sham Kakade
Ali Farhadi
Prateek Jain
39
23
0
11 Oct 2023
Online Speculative Decoding
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
29
53
0
11 Oct 2023
Fast and Robust Early-Exiting Framework for Autoregressive Language Models with Synchronized Parallel Decoding
Sangmin Bae
Jongwoo Ko
Hwanjun Song
SeYoung Yun
32
55
0
09 Oct 2023
SCALE: Synergized Collaboration of Asymmetric Language Translation Engines
Xin Cheng
Xun Wang
Tao Ge
Si-Qing Chen
Heng Chang
Dongyan Zhao
Rui Yan
69
2
0
29 Sep 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming Liu
Bing Qin
Ting Liu
LRM
AI4CE
37
155
0
27 Sep 2023
Sorted LLaMA: Unlocking the Potential of Intermediate Layers of Large Language Models for Dynamic Inference
Parsa Kavehzadeh
Mojtaba Valipour
Marzieh S. Tahaei
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
35
6
0
16 Sep 2023
Understanding the Impact of Post-Training Quantization on Large Language Models
Somnath Roy
MQ
38
3
0
11 Sep 2023
LLMCad: Fast and Scalable On-device Large Language Model Inference
Daliang Xu
Wangsong Yin
Xin Jin
Wenjie Qu
Shiyun Wei
Mengwei Xu
Xuanzhe Liu
25
44
0
08 Sep 2023
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
24
24
0
22 Aug 2023
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
25
102
0
08 Aug 2023
RecycleGPT: An Autoregressive Language Model with Recyclable Module
Yu Jiang
Qiaozhi He
Xiaomin Zhuang
Zhihua Wu
Kunpeng Wang
Wenlai Zhao
Guangwen Yang
KELM
30
3
0
07 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
23
33
0
12 Jul 2023
Previous
1
2
3
4
5
6
7
Next