Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.12072
Cited By
SPEED: Speculative Pipelined Execution for Efficient Decoding
18 October 2023
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Hasan Genç
Kurt Keutzer
A. Gholami
Y. Shao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SPEED: Speculative Pipelined Execution for Efficient Decoding"
8 / 8 papers shown
Title
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
123
6
0
03 Mar 2025
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan Oseledets
48
1
0
23 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
56
6
0
09 Oct 2024
PipeInfer: Accelerating LLM Inference using Asynchronous Pipelined Speculation
Branden Butler
Sixing Yu
Arya Mazaheri
Ali Jannesari
LRM
46
6
0
16 Jul 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
52
123
0
26 Jan 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Feng-Huei Lin
Hanling Yi
Hongbin Li
Yifan Yang
Xiaotian Yu
Guangming Lu
Rong Xiao
41
3
0
23 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Yongqi Li
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
38
101
0
15 Jan 2024
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
211
3,513
0
10 Jun 2015
1