Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15077
Cited By
v1
v2
v3 (latest)
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
26 January 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"
50 / 162 papers shown
Title
PredGen: Accelerated Inference of Large Language Models through Input-Time Speculation for Real-Time Speech Interaction
Shufan Li
Aditya Grover
17
0
0
18 Jun 2025
S
4
^4
4
C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models
Tao He
Guang Huang
Yu Yang
Tianshi Xu
Sicheng Zhao
Guiguang Ding
Pengyang Wang
Feng Tian
LRM
28
0
0
17 Jun 2025
LiteVLM: A Low-Latency Vision-Language Model Inference Pipeline for Resource-Constrained Environments
Jin Huang
Yuchao Jin
Le An
Josh Park
VLM
19
0
0
09 Jun 2025
Gumbel-max List Sampling for Distribution Coupling with Multiple Samples
Joseph Rowan
Buu Phan
Ashish Khisti
36
0
0
05 Jun 2025
Accelerated Test-Time Scaling with Model-Free Speculative Sampling
Woomin Song
Saket Dingliwal
Sai Muralidhar Jayanthi
Bhavana Ganesh
Jinwoo Shin
Aram Galstyan
S. Bodapati
LRM
110
0
0
05 Jun 2025
Rectified Sparse Attention
Yutao Sun
Tianzhu Ye
Li Dong
Yuqing Xia
Jian Chen
Yizhao Gao
S. Cao
Jianyong Wang
Furu Wei
82
1
0
04 Jun 2025
Out-of-Vocabulary Sampling Boosts Speculative Decoding
Nadav Timor
Jonathan Mamou
Oren Pereg
Hongyang Zhang
David Harel
OODD
34
0
0
02 Jun 2025
Mamba Drafters for Speculative Decoding
Daewon Choi
Seunghyuk Oh
Saket Dingliwal
Jihoon Tack
Kyuyoung Kim
...
Insu Han
Jinwoo Shin
Aram Galstyan
Shubham Katiyar
S. Bodapati
43
0
0
01 Jun 2025
CLaSp: In-Context Layer Skip for Self-Speculative Decoding
Longze Chen
Renke Shan
Huiming Wang
Lu Wang
Ziqiang Liu
Run Luo
Jiawei Wang
Hamid Alinejad-Rokny
Min Yang
37
0
0
30 May 2025
Speculative Decoding Meets Quantization: Compatibility Evaluation and Hierarchical Framework Design
Yudi Zhang
Weilin Zhao
Xu Han
Tiejun Zhao
Wang Xu
Hailong Cao
Conghui Zhu
MQ
58
1
0
28 May 2025
SpecExtend: A Drop-in Enhancement for Speculative Decoding of Long Sequences
Jungyoub Cha
Hyunjong Kim
Sungzoon Cho
VLM
80
0
0
27 May 2025
R2R: Efficiently Navigating Divergent Reasoning Paths with Small-Large Model Token Routing
Tianyu Fu
Yi Ge
Yichen You
Enshu Liu
Zhihang Yuan
Guohao Dai
Shengen Yan
Huazhong Yang
Yu Wang
MoE
LRM
70
1
0
27 May 2025
Faster and Better LLMs via Latency-Aware Test-Time Scaling
Zili Wang
Tianyu Zhang
Haoli Bai
Lu Hou
Xianzhi Yu
Wulong Liu
Shiming Xiang
Lei Zhu
LRM
91
0
0
26 May 2025
Sparse-to-Dense: A Free Lunch for Lossless Acceleration of Video Understanding in LLMs
Xuan Zhang
Cunxiao Du
Sicheng Yu
Jiawei Wu
Fengzhuo Zhang
Wei Gao
Qian Liu
65
0
0
25 May 2025
Do Large Language Models (Really) Need Statistical Foundations?
Weijie Su
274
0
0
25 May 2025
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
Yixuan Wang
Yijun Liu
Shiyu Ji
Yuzhuang Xu
Yang Xu
Qingfu Zhu
Wanxiang Che
OffRL
LRM
54
0
0
24 May 2025
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu
Xiaobo Xia
Weixiang Zhao
Manyi Zhang
Xianzhi Yu
Xiu Su
Shuo Yang
See-Kiong Ng
Tat-Seng Chua
KELM
LRM
94
0
0
23 May 2025
VeriThinker: Learning to Verify Makes Reasoning Model Efficient
Zigeng Chen
Xinyin Ma
Gongfan Fang
Ruonan Yu
Xinchao Wang
LRM
167
1
0
23 May 2025
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Yunlong Hou
Fengzhuo Zhang
Cunxiao Du
Xuan Zhang
Jiachun Pan
Tianyu Pang
Chao Du
Vincent Y. F. Tan
Zhuoran Yang
OffRL
122
1
0
21 May 2025
Alignment-Augmented Speculative Decoding with Alignment Sampling and Conditional Verification
Jikai Wang
Zhenxu Tian
Jilong Li
Qingrong Xia
Xinyu Duan
Zhefeng Wang
Baoxing Huai
Min Zhang
61
0
0
19 May 2025
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
133
0
0
19 May 2025
SpecEdge: Scalable Edge-Assisted Serving Framework for Interactive LLMs
Jinwoo Park
Seunggeun Cho
Dongsu Han
84
0
0
16 May 2025
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Siyang Song
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
150
0
0
15 May 2025
Multi-Token Prediction Needs Registers
Anastasios Gerontopoulos
Spyros Gidaris
N. Komodakis
118
0
0
15 May 2025
Insights into DeepSeek-V3: Scaling Challenges and Reflections on Hardware for AI Architectures
Chenggang Zhao
Chengqi Deng
Chong Ruan
Damai Dai
Huazuo Gao
...
Wenfeng Liang
Ying He
Yun Wang
Yuxuan Liu
Y. X. Wei
MoE
74
1
0
14 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Yongqian Li
Haojie Wang
Biao Hou
Jidong Zhai
140
1
0
12 May 2025
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
135
0
0
08 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
80
2
0
06 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
Mingxuan Yuan
Jianye Hao
Feng Wu
ReLM
LRM
156
1
0
03 May 2025
PipeSpec: Breaking Stage Dependencies in Hierarchical LLM Decoding
Bradley McDanel
Shanghang Zhang
Y. Hu
Zining Liu
MoE
441
0
0
02 May 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
Junlin Li
Jianye Hou
Hao Fei
Lijun Wu
Min Zhang
LLMAG
LRM
136
5
0
27 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Ziqiang Liu
Dong Li
E. Barsoum
180
0
0
23 Apr 2025
Advancing AI-assisted Hardware Design with Hierarchical Decentralized Training and Personalized Inference-Time Optimization
Hao Mark Chen
Zehuan Zhang
Wanru Zhao
Nicholas D. Lane
Hongxiang Fan
25
0
0
21 Apr 2025
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Siming Chen
Jiajian Li
Yaoxiu Lian
Junyi Wu
Guohao Dai
LRM
67
0
0
11 Apr 2025
SD
2
^2
2
: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Yani Andrew Ioannou
Vithursan Thangarasa
417
1
0
10 Apr 2025
Hogwild! Inference: Parallel LLM Generation via Concurrent Attention
Gleb Rodionov
Roman Garipov
Alina Shutova
George Yakushev
Erik Schultheis
Vage Egiazarian
Anton Sinitsin
Denis Kuznedelev
Dan Alistarh
LRM
149
5
0
08 Apr 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
Yuhao Wang
Heyang Liu
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
458
3
0
05 Apr 2025
PipeDec: Low-Latency Pipeline-based Inference with Dynamic Speculative Decoding towards Large-scale Models
Haofei Yin
Mengbai Xiao
Rouzhou Lu
Xiao Zhang
Dongxiao Yu
Guanghui Zhang
AI4CE
79
0
0
05 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Hui Yuan
Lefei Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
120
1
0
31 Mar 2025
A Survey of Efficient Reasoning for Large Reasoning Models: Language, Multimodality, and Beyond
Xiaoye Qu
Yafu Li
Zhaochen Su
Weigao Sun
Jianhao Yan
...
Chaochao Lu
Yue Zhang
Xian-Sheng Hua
Bowen Zhou
Yu Cheng
ReLM
OffRL
LRM
186
47
0
27 Mar 2025
PCM : Picard Consistency Model for Fast Parallel Sampling of Diffusion Models
Junhyuk So
Jiwoong Shin
Chaeyeon Jang
Eunhyeok Park
DiffM
129
0
0
25 Mar 2025
A Novel Hat-Shaped Device-Cloud Collaborative Inference Framework for Large Language Models
Zuan Xie
Yang Xu
Hongli Xu
Yunming Liao
Zhiwei Yao
138
0
0
23 Mar 2025
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
E. Georganas
Dhiraj D. Kalamkar
Alexander Kozlov
A. Heinecke
MQ
439
1
0
17 Mar 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Jiajun Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
121
0
0
13 Mar 2025
Speculative Decoding for Multi-Sample Inference
Yiwei Li
Jiayi Shi
Shaoxiong Feng
Peiwen Yuan
Xinyu Wang
...
Ji Zhang
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
LRM
96
1
0
07 Mar 2025
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
99
2
0
07 Mar 2025
RASD: Retrieval-Augmented Speculative Decoding
Guofeng Quan
Wenfeng Feng
Chuzhan Hao
Guochao Jiang
Yuewei Zhang
Hao Wang
RALM
161
1
0
05 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
252
18
0
03 Mar 2025
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv
Honglin Guo
Qipeng Guo
Xipeng Qiu
103
0
0
02 Mar 2025
Tutorial Proposal: Speculative Decoding for Efficient LLM Inference
Heming Xia
Cunxiao Du
Yongqian Li
Qian Liu
Wenjie Li
89
0
0
01 Mar 2025
1
2
3
4
Next