ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.15077
  4. Cited By
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
v1v2v3 (latest)

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

26 January 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
ArXiv (abs)PDFHTML

Papers citing "EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"

50 / 162 papers shown
Title
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
154
86
0
24 Jun 2024
Towards Fast Multilingual LLM Inference: Speculative Decoding and
  Specialized Drafters
Towards Fast Multilingual LLM Inference: Speculative Decoding and Specialized Drafters
Euiin Yi
Taehyeon Kim
Hongseok Jeung
Du-Seong Chang
Se-Young Yun
67
4
0
24 Jun 2024
Optimizing Speculative Decoding for Serving Large Language Models Using
  Goodput
Optimizing Speculative Decoding for Serving Large Language Models Using Goodput
Xiaoxuan Liu
Cade Daniel
Langxiang Hu
Woosuk Kwon
Zhuohan Li
Xiangxi Mo
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
106
23
0
20 Jun 2024
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
PowerInfer-2: Fast Large Language Model Inference on a Smartphone
Zhenliang Xue
Yixin Song
Zeyu Mi
Le Chen
Yubin Xia
Haibo Chen
120
52
0
10 Jun 2024
RU-AI: A Large Multimodal Dataset for Machine Generated Content
  Detection
RU-AI: A Large Multimodal Dataset for Machine Generated Content Detection
Liting Huang
Zhihao Zhang
Yiran Zhang
Xiyue Zhou
Shoujin Wang
NoLa
81
4
0
07 Jun 2024
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Block Transformer: Global-to-Local Language Modeling for Fast Inference
Namgyu Ho
Sangmin Bae
Taehyeon Kim
Hyunjik Jo
Yireun Kim
Tal Schuster
Adam Fisch
James Thorne
Se-Young Yun
107
9
0
04 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for
  Low-Memory GPUs
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
115
7
0
30 May 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
149
26
0
30 May 2024
Faster Cascades via Speculative Decoding
Faster Cascades via Speculative Decoding
Harikrishna Narasimhan
Wittawat Jitkrittum
A. S. Rawat
Seungyeon Kim
Neha Gupta
A. Menon
Sanjiv Kumar
LRM
111
10
0
29 May 2024
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient
  Acceleration of LLM Inference
Hardware-Aware Parallel Prompt Decoding for Memory-Efficient Acceleration of LLM Inference
Hao Mark Chen
Wayne Luk
Ka-Fai Cedric Yiu
Rui Li
Konstantin Mishchenko
Stylianos I. Venieris
Hongxiang Fan
82
9
0
28 May 2024
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Integer Scale: A Free Lunch for Faster Fine-grained Quantization of LLMs
Qingyuan Li
Ran Meng
Yiduo Li
Bo Zhang
Yifan Lu
Yerui Sun
Lin Ma
Yuchen Xie
MQ
104
0
0
23 May 2024
Atomic Self-Consistency for Better Long Form Generations
Atomic Self-Consistency for Better Long Form Generations
Raghuveer Thirukovalluru
Yukun Huang
Bhuwan Dhingra
80
5
0
21 May 2024
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating
  Large Language Models
EMS-SD: Efficient Multi-sample Speculative Decoding for Accelerating Large Language Models
Yunsheng Ni
Chuanjian Liu
Yehui Tang
Kai Han
Yunhe Wang
107
1
0
13 May 2024
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large
  Language Models
Dynamic Speculation Lookahead Accelerates Speculative Decoding of Large Language Models
Jonathan Mamou
Oren Pereg
Daniel Korat
Moshe Berchansky
Nadav Timor
Moshe Wasserblat
Roy Schwartz
64
6
0
07 May 2024
Accelerating Production LLMs with Combined Token/Embedding Speculators
Accelerating Production LLMs with Combined Token/Embedding Speculators
Davis Wertheimer
Joshua Rosenkranz
Thomas Parnell
Sahil Suneja
Pavithra Ranganathan
R. Ganti
Mudhakar Srivatsa
152
5
0
29 Apr 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
111
29
0
29 Apr 2024
BASS: Batched Attention-optimized Speculative Sampling
BASS: Batched Attention-optimized Speculative Sampling
Haifeng Qian
Sujan Kumar Gonugondla
Sungsoo Ha
Mingyue Shang
Sanjay Krishna Gouda
Ramesh Nallapati
Sudipta Sengupta
Xiaofei Ma
Anoop Deoras
BDL
105
8
0
24 Apr 2024
TriForce: Lossless Acceleration of Long Sequence Generation with
  Hierarchical Speculative Decoding
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
121
65
0
18 Apr 2024
Transformer-Lite: High-efficiency Deployment of Large Language Models on
  Mobile Phone GPUs
Transformer-Lite: High-efficiency Deployment of Large Language Models on Mobile Phone GPUs
Luchang Li
Sheng Qian
Jie Lu
Lunxi Yuan
Rui Wang
Qin Xie
90
10
0
29 Mar 2024
SDSAT: Accelerating LLM Inference through Speculative Decoding with
  Semantic Adaptive Tokens
SDSAT: Accelerating LLM Inference through Speculative Decoding with Semantic Adaptive Tokens
Chengbo Liu
Yong Zhu
56
0
0
27 Mar 2024
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Aonan Zhang
Chong-Jun Wang
Yi Wang
Xuanyu Zhang
Yunfei Cheng
91
18
0
14 Mar 2024
Rethinking Generative Large Language Model Evaluation for Semantic
  Comprehension
Rethinking Generative Large Language Model Evaluation for Semantic Comprehension
Fangyun Wei
Xi Chen
Linzi Luo
ELMALMLRM
63
8
0
12 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
233
91
0
26 Feb 2024
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster
  Speculative Decoding
Ouroboros: Generating Longer Drafts Phrase by Phrase for Faster Speculative Decoding
Weilin Zhao
Yuxiang Huang
Xu Han
Wang Xu
Chaojun Xiao
Xinrong Zhang
Yewei Fang
Kaihuo Zhang
Zhiyuan Liu
Maosong Sun
125
12
0
21 Feb 2024
DriveVLM: The Convergence of Autonomous Driving and Large
  Vision-Language Models
DriveVLM: The Convergence of Autonomous Driving and Large Vision-Language Models
Xiaoyu Tian
Junru Gu
Bailin Li
Yicheng Liu
Yang Wang
Chenxu Hu
Kun Zhan
Peng Jia
Xianpeng Lang
Hang Zhao
VLM
209
165
0
19 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead
  Decoding
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
204
164
0
03 Feb 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
162
130
0
15 Jan 2024
Mixtral of Experts
Mixtral of Experts
Albert Q. Jiang
Alexandre Sablayrolles
Antoine Roux
A. Mensch
Blanche Savary
...
Théophile Gervet
Thibaut Lavril
Thomas Wang
Timothée Lacroix
William El Sayed
MoELLMAG
171
1,129
0
08 Jan 2024
TinyLlama: An Open-Source Small Language Model
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALMLRM
199
409
0
04 Jan 2024
Cascade Speculative Drafting for Even Faster LLM Inference
Cascade Speculative Drafting for Even Faster LLM Inference
Ziyi Chen
Xiaocong Yang
Jiacheng Lin
Chenkai Sun
Kevin Chen-Chuan Chang
Jie Huang
LRM
124
52
0
18 Dec 2023
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Mamba: Linear-Time Sequence Modeling with Selective State Spaces
Albert Gu
Tri Dao
Mamba
168
2,839
0
01 Dec 2023
REST: Retrieval-Based Speculative Decoding
REST: Retrieval-Based Speculative Decoding
Zhenyu He
Zexuan Zhong
Tianle Cai
Jason D. Lee
Di He
RALM
95
91
0
14 Nov 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
SpecTr: Fast Speculative Decoding via Optimal Transport
Ziteng Sun
A. Suresh
Jae Hun Ro
Ahmad Beirami
Himanshu Jain
Felix X. Yu
105
81
0
23 Oct 2023
SPEED: Speculative Pipelined Execution for Efficient Decoding
SPEED: Speculative Pipelined Execution for Efficient Decoding
Coleman Hooper
Sehoon Kim
Hiva Mohammadzadeh
Hasan Genç
Kurt Keutzer
A. Gholami
Y. Shao
77
41
0
18 Oct 2023
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
DistillSpec: Improving Speculative Decoding via Knowledge Distillation
Yongchao Zhou
Kaifeng Lyu
A. S. Rawat
A. Menon
Afshin Rostamizadeh
Sanjiv Kumar
Jean-François Kagy
Rishabh Agarwal
135
95
0
12 Oct 2023
Online Speculative Decoding
Online Speculative Decoding
Xiaoxuan Liu
Lanxiang Hu
Peter Bailis
Alvin Cheung
Zhijie Deng
Ion Stoica
Hao Zhang
141
62
0
11 Oct 2023
NEFTune: Noisy Embeddings Improve Instruction Finetuning
NEFTune: Noisy Embeddings Improve Instruction Finetuning
Neel Jain
Ping Yeh-Chiang
Yuxin Wen
John Kirchenbauer
Hong-Min Chu
...
Avi Schwarzschild
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
84
81
0
09 Oct 2023
Accelerating LLM Inference with Staged Speculative Decoding
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
91
113
0
08 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
537
12,130
0
18 Jul 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM
  Decoding
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
93
38
0
12 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALMOSLMELM
595
4,456
0
09 Jun 2023
Accelerating Transformer Inference for Translation via Parallel Decoding
Accelerating Transformer Inference for Translation via Parallel Decoding
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
130
90
0
17 May 2023
Inference with Reference: Lossless Acceleration of Large Language Models
Inference with Reference: Lossless Acceleration of Large Language Models
Nan Yang
Tao Ge
Liang Wang
Binxing Jiao
Daxin Jiang
Linjun Yang
Rangan Majumder
Furu Wei
90
63
0
10 Apr 2023
Speculative Decoding with Big Little Decoder
Speculative Decoding with Big Little Decoder
Sehoon Kim
K. Mangalam
Suhong Moon
Jitendra Malik
Michael W. Mahoney
A. Gholami
Kurt Keutzer
MoE
147
112
0
15 Feb 2023
Accelerating Large Language Model Decoding with Speculative Sampling
Accelerating Large Language Model Decoding with Speculative Sampling
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
BDLLRM
104
436
0
02 Feb 2023
Fast Inference from Transformers via Speculative Decoding
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
159
738
0
30 Nov 2022
Speculative Decoding: Exploiting Speculative Execution for Accelerating
  Seq2seq Generation
Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
118
90
0
30 Mar 2022
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for
  Large Language Models
The Optimal BERT Surgeon: Scalable and Accurate Second-Order Pruning for Large Language Models
Eldar Kurtic
Daniel Fernando Campos
Tuan Nguyen
Elias Frantar
Mark Kurtz
Ben Fineran
Michael Goin
Dan Alistarh
VLMMQMedIm
122
127
0
14 Mar 2022
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
425
4,609
0
27 Oct 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
292
5,698
0
07 Jul 2021
Previous
1234
Next