Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15077
Cited By
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
26 January 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"
50 / 106 papers shown
Title
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Shri Kiran Srinivasan
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
27
0
0
15 May 2025
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos
Spyros Gidaris
N. Komodakis
24
0
0
15 May 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yishuo Wang
Yuxuan Liu
Y. X. Wei
MoE
41
0
0
14 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Yongqian Li
Haojie Wang
Biao Hou
Jidong Zhai
40
0
0
12 May 2025
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
48
0
0
08 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
32
0
0
06 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
59
0
0
03 May 2025
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
Bradley McDanel
S. Zhang
Y. Hu
Zining Liu
MoE
116
0
0
02 May 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
J. Li
Lijun Wu
M. Zhang
LLMAG
LRM
69
2
0
27 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
58
0
0
23 Apr 2025
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Siming Chen
J. Li
Yaoxiu Lian
Junyi Wu
Guohao Dai
LRM
35
0
0
11 Apr 2025
SD
2
^2
2
: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Vithursan Thangarasa
143
1
0
10 Apr 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov
Roman Garipov
Alina Shutova
George Yakushev
Vage Egiazarian
Anton Sinitsin
Denis Kuznedelev
Dan Alistarh
LRM
27
2
0
08 Apr 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
Yuhao Wang
Heyang Liu
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
151
0
0
05 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
24
0
0
05 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zehan Li
L. Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
59
0
0
31 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
88
16
0
27 Mar 2025
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
Junhyuk So
Jiwoong Shin
Chaeyeon Jang
Eunhyeok Park
DiffM
50
0
0
25 Mar 2025
A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models
Zuan Xie
Yang Xu
Hongli Xu
Yunming Liao
Zhiwei Yao
53
0
0
23 Mar 2025
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
E. Georganas
Dhiraj D. Kalamkar
Alexander Kozlov
A. Heinecke
MQ
136
0
0
17 Mar 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Jiajun Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
61
0
0
13 Mar 2025
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
36
1
0
07 Mar 2025
Speculative Decoding for Multi-Sample Inference
Yiwei Li
Jiayi Shi
Shaoxiong Feng
Peiwen Yuan
Xinyu Wang
...
Ji Zhang
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
LRM
44
0
0
07 Mar 2025
RASD: Retrieval-Augmented Speculative Decoding
Guofeng Quan
Wenfeng Feng
Chuzhan Hao
Guochao Jiang
Yuewei Zhang
Hao Wang
RALM
77
1
0
05 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
120
6
0
03 Mar 2025
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv
Honglin Guo
Qipeng Guo
Xipeng Qiu
41
0
0
02 Mar 2025
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia
Cunxiao Du
Yongqian Li
Qian Liu
Wenjie Li
39
0
0
01 Mar 2025
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
Maximilian Holsman
Yukun Huang
Bhuwan Dhingra
41
0
0
28 Feb 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
S. Zhang
93
0
0
27 Feb 2025
Towards Optimal Multi-draft Speculative Decoding
Z. Hu
Tong Zheng
Vignesh Viswanathan
Ziyi Chen
Ryan Rossi
Yihan Wu
Dinesh Manocha
Heng Huang
42
3
0
26 Feb 2025
DReSD: Dense Retrieval for Speculative Decoding
Milan Gritta
Huiyin Xue
Gerasimos Lampouras
RALM
100
0
0
24 Feb 2025
TETRIS: Optimal Draft Token Selection for Batch Speculative Decoding
Zhaoxuan Wu
Zijian Zhou
Arun Verma
Alok Prakash
Daniela Rus
Bryan Kian Hsiang Low
60
0
0
24 Feb 2025
CORAL: Learning Consistent Representations across Multi-step Training with Lighter Speculative Drafter
Yepeng Weng
Dianwen Mei
Huishi Qiu
Xujie Chen
Li Liu
Jiang Tian
Zhongchao Shi
50
0
0
24 Feb 2025
CodeSwift: Accelerating LLM Inference for Efficient Code Generation
Qianhui Zhao
L. Zhang
Fang Liu
Xiaoli Lian
Qiaoyuanhe Meng
Ziqian Jiao
Zetong Zhou
Borui Zhang
Runlin Guo
Jia Li
41
0
0
24 Feb 2025
C2T: A Classifier-Based Tree Construction Method in Speculative Decoding
Feiye Huo
Jianchao Tan
Kaipeng Zhang
Xunliang Cai
Shengli Sun
44
0
0
20 Feb 2025
GRIFFIN: Effective Token Alignment for Faster Speculative Decoding
Shijing Hu
Jingyang Li
Xingyu Xie
Zhihui Lu
Kim-Chuan Toh
Pan Zhou
40
0
0
16 Feb 2025
Lossless Acceleration of Large Language Models with Hierarchical Drafting based on Temporal Locality in Speculative Decoding
Sukmin Cho
S. Choi
T. Hwang
Jeongyeon Seo
Soyeong Jeong
Huije Lee
Hoyun Song
Jong C. Park
Youngjin Kwon
51
0
0
08 Feb 2025
QuantSpec: Self-Speculative Decoding with Hierarchical Quantized KV Cache
Rishabh Tiwari
Haocheng Xi
Aditya Tomar
Coleman Hooper
Sehoon Kim
Maxwell Horton
Mahyar Najibi
Michael W. Mahoney
Kemal Kurniawan
Amir Gholami
MQ
58
1
0
05 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
95
6
0
31 Jan 2025
Tackling the Dynamicity in a Production LLM Serving System with SOTA Optimizations via Hybrid Prefill/Decode/Verify Scheduling on Efficient Meta-kernels
Mingcong Song
Xinru Tang
Fengfan Hou
Jing Li
Wei Wei
...
Hongjie Si
D. Jiang
Shouyi Yin
Yang Hu
Guoping Long
36
1
0
24 Dec 2024
Parallelized Autoregressive Visual Generation
Yunhong Wang
Shuhuai Ren
Zhijie Lin
Yujin Han
Haoyuan Guo
Zhenheng Yang
Difan Zou
Jiashi Feng
Xihui Liu
VGen
90
12
0
19 Dec 2024
Falcon: Faster and Parallel Inference of Large Language Models through Enhanced Semi-Autoregressive Drafting and Custom-Designed Decoding Tree
Xiangxiang Gao
Weisheng Xie
Yiwei Xiang
Feng Ji
82
6
0
17 Dec 2024
VLsI: Verbalized Layers-to-Interactions from Large to Small Vision Language Models
Byung-Kwan Lee
Ryo Hachiuma
Yu-Chiang Frank Wang
Y. Ro
Yueh-Hua Wu
VLM
81
0
0
02 Dec 2024
PLD+: Accelerating LLM inference by leveraging Language Model Artifacts
Shwetha Somasundaram
Anirudh Phukan
Apoorv Saxena
80
1
0
02 Dec 2024
Speculative Decoding with CTC-based Draft Model for LLM Inference Acceleration
Zhuofan Wen
Shangtong Gui
Yang Feng
96
2
0
25 Nov 2024
Closer Look at Efficient Inference Methods: A Survey of Speculative Decoding
Hyun Ryu
Eric Kim
77
3
0
20 Nov 2024
SAM Decoding: Speculative Decoding via Suffix Automaton
Yuxuan Hu
Ke Wang
Jing Zhang
Fanjin Zhang
C. Li
H. Chen
Jing Zhang
49
1
0
16 Nov 2024
SpecHub: Provable Acceleration to Multi-Draft Speculative Decoding
Ryan Sun
Tianyi Zhou
Xun Chen
Lichao Sun
32
4
0
08 Nov 2024
The N-Grammys: Accelerating Autoregressive Inference with Learning-Free Batched Speculation
Lawrence Stewart
Matthew Trager
Sujan Kumar Gonugondla
Stefano Soatto
53
5
0
06 Nov 2024
Accelerated AI Inference via Dynamic Execution Methods
Haim Barad
Jascha Achterberg
Tien Pei Chou
Jean Yu
31
0
0
30 Oct 2024
1
2
3
Next