ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.01318
  4. Cited By
Accelerating Large Language Model Decoding with Speculative Sampling

Accelerating Large Language Model Decoding with Speculative Sampling

2 February 2023
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
    BDL
    LRM
ArXivPDFHTML

Papers citing "Accelerating Large Language Model Decoding with Speculative Sampling"

50 / 316 papers shown
Title
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large
  Language Models
Coarse-to-Fine Highlighting: Reducing Knowledge Hallucination in Large Language Models
Qitan Lv
Jie Wang
Hanzhu Chen
Bin Li
Yongdong Zhang
Feng Wu
HILM
33
3
0
19 Oct 2024
MoDification: Mixture of Depths Made Easy
MoDification: Mixture of Depths Made Easy
C. Zhang
M. Zhong
Qimeng Wang
Xuantao Lu
Zheyu Ye
...
Yan Gao
Yao Hu
Kehai Chen
Min Zhang
Dawei Song
VLM
MoE
38
2
0
18 Oct 2024
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search
  and Best-of-N Sampling
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
Huazheng Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
56
23
0
18 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction
  and Speculative Decoding
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
39
0
0
17 Oct 2024
Cerberus: Efficient Inference with Adaptive Parallel Decoding and
  Sequential Knowledge Enhancement
Cerberus: Efficient Inference with Adaptive Parallel Decoding and Sequential Knowledge Enhancement
Yuxuan Liu
Wenyuan Li
Laizhong Cui
Hailiang Yang
OffRL
34
0
0
17 Oct 2024
Learning to Route LLMs with Confidence Tokens
Learning to Route LLMs with Confidence Tokens
Yu-Neng Chuang
Helen Zhou
Prathusha Kameswara Sarma
Parikshit Gopalan
John Boccio
Sara Bolouki
Xia Hu
35
8
0
17 Oct 2024
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
DySpec: Faster Speculative Decoding with Dynamic Token Tree Structure
Yunfan Xiong
Ruoyu Zhang
Yanzeng Li
Tianhao Wu
Lei Zou
40
5
0
15 Oct 2024
QSpec: Speculative Decoding with Complementary Quantization Schemes
QSpec: Speculative Decoding with Complementary Quantization Schemes
Juntao Zhao
Wenhao Lu
Sheng Wang
Lingpeng Kong
Chuan Wu
MQ
74
5
0
15 Oct 2024
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Speculative Knowledge Distillation: Bridging the Teacher-Student Gap Through Interleaved Sampling
Wenyuan Xu
Rujun Han
Zhenting Wang
L. Le
Dhruv Madeka
Lei Li
Wenjie Wang
Rishabh Agarwal
Chen-Yu Lee
Tomas Pfister
80
8
0
15 Oct 2024
Customize Your Visual Autoregressive Recipe with Set Autoregressive
  Modeling
Customize Your Visual Autoregressive Recipe with Set Autoregressive Modeling
Wenze Liu
Le Zhuo
Yi Xin
Sheng Xia
Peng Gao
Xiangyu Yue
42
6
0
14 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
38
1
0
13 Oct 2024
COrAL: Order-Agnostic Language Modeling for Efficient Iterative
  Refinement
COrAL: Order-Agnostic Language Modeling for Efficient Iterative Refinement
Yuxi Xie
Anirudh Goyal
Xiaobao Wu
Xunjian Yin
Xiao Xu
Min-Yen Kan
Liangming Pan
William Yang Wang
LRM
164
1
0
12 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
59
6
0
09 Oct 2024
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Xinyi Zeng
Yuying Shang
Yutao Zhu
Jingyuan Zhang
Yu Tian
AAML
211
2
0
09 Oct 2024
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
Zilin Xiao
Hongming Zhang
Tao Ge
Siru Ouyang
Vicente Ordonez
Dong Yu
46
5
0
08 Oct 2024
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
LANTERN: Accelerating Visual Autoregressive Models with Relaxed Speculative Decoding
Doohyuk Jang
Sihwan Park
J. Yang
Yeonsung Jung
Jihun Yun
Souvik Kundu
Sung-Yub Kim
Eunho Yang
51
7
0
04 Oct 2024
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Interpretable Contrastive Monte Carlo Tree Search Reasoning
Zitian Gao
Boye Niu
Xuzheng He
Haotian Xu
Hongzhang Liu
Aiwei Liu
Xuming Hu
Lijie Wen
LRM
65
28
0
02 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng
Han Shi
Xian Liu
Xuefei Ning
Guohao Dai
Yu Wang
Zhenguo Li
Xihui Liu
58
10
0
02 Oct 2024
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine
  Similarity
Draft on the Fly: Adaptive Self-Speculative Decoding using Cosine Similarity
Michael R. Metel
Peng Lu
Boxing Chen
Mehdi Rezagholizadeh
I. Kobyzev
35
3
0
01 Oct 2024
Interactive Speculative Planning: Enhance Agent Efficiency through
  Co-design of System and User Interface
Interactive Speculative Planning: Enhance Agent Efficiency through Co-design of System and User Interface
Wenyue Hua
Mengting Wan
Shashank Vadrevu
Ryan Nadel
Yongfeng Zhang
Chi Wang
LLMAG
39
1
0
30 Sep 2024
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Dynamic-Width Speculative Beam Decoding for Efficient LLM Inference
Zongyue Qin
Zifan He
Neha Prakriya
Jason Cong
Yizhou Sun
30
4
0
25 Sep 2024
Whisper in Medusa's Ear: Multi-head Efficient Decoding for
  Transformer-based ASR
Whisper in Medusa's Ear: Multi-head Efficient Decoding for Transformer-based ASR
Yael Segal-Feldman
Aviv Shamsian
Aviv Navon
Gill Hetz
Joseph Keshet
32
1
0
24 Sep 2024
Efficiently Dispatching Flash Attention For Partially Filled Attention
  Masks
Efficiently Dispatching Flash Attention For Partially Filled Attention Masks
Agniv Sharma
Jonas Geiping
29
0
0
23 Sep 2024
What is the Role of Small Models in the LLM Era: A Survey
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
66
23
0
10 Sep 2024
NDP: Next Distribution Prediction as a More Broad Target
NDP: Next Distribution Prediction as a More Broad Target
Junhao Ruan
Abudukeyumu Abudula
Xinyu Liu
Bei Li
Yinqiao Li
Chenglong Wang
Yuchun Fan
Yuan Ge
Tong Xiao
Jingbo Zhu
40
0
0
30 Aug 2024
Boosting Lossless Speculative Decoding via Feature Sampling and Partial
  Alignment Distillation
Boosting Lossless Speculative Decoding via Feature Sampling and Partial Alignment Distillation
Lujun Gui
Bin Xiao
Lei Su
Weipeng Chen
44
2
0
28 Aug 2024
Learning Harmonized Representations for Speculative Sampling
Learning Harmonized Representations for Speculative Sampling
Lefan Zhang
Xiaodan Wang
Yanhua Huang
Ruiwen Xu
26
10
0
28 Aug 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
41
22
0
20 Aug 2024
Parallel Sampling via Counting
Parallel Sampling via Counting
Nima Anari
Ruiquan Gao
Aviad Rubinstein
71
3
0
18 Aug 2024
Context-Aware Assistant Selection for Improved Inference Acceleration
  with Large Language Models
Context-Aware Assistant Selection for Improved Inference Acceleration with Large Language Models
Jerry Huang
Prasanna Parthasarathi
Mehdi Rezagholizadeh
Sarath Chandar
54
1
0
16 Aug 2024
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Turning Trash into Treasure: Accelerating Inference of Large Language Models with Token Recycling
Xianzhen Luo
Yixuan Wang
Qingfu Zhu
Zhiming Zhang
Xuanyu Zhang
Qing Yang
Dongliang Xu
39
4
0
16 Aug 2024
P/D-Serve: Serving Disaggregated Large Language Model at Scale
P/D-Serve: Serving Disaggregated Large Language Model at Scale
Yibo Jin
Tao Wang
Huimin Lin
Mingyang Song
Peiyang Li
...
Haoliang Cheng
Xiaojing Li
Jiandong Ding
Hefei Guo
Zhengyong Zhang
MoE
41
10
0
15 Aug 2024
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft
  Heads with Adversarial Learning
KOALA: Enhancing Speculative Decoding for LLM via Multi-Layer Draft Heads with Adversarial Learning
Kaiqi Zhang
Jing Zhao
Rui Chen
39
1
0
15 Aug 2024
Coupling without Communication and Drafter-Invariant Speculative Decoding
Coupling without Communication and Drafter-Invariant Speculative Decoding
Majid Daliri
Christopher Musco
A. Suresh
54
1
0
15 Aug 2024
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
PEARL: Parallel Speculative Decoding with Adaptive Draft Length
Tianyu Liu
Yun Li
Qitan Lv
Kai Liu
Jianchen Zhu
Winston Hu
Xingchen Sun
62
14
0
13 Aug 2024
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding
Yunjia Xi
Hangyu Wang
Bo Chen
Jianghao Lin
Menghui Zhu
Wen Liu
Ruiming Tang
Zhewei Wei
Wenbo Zhang
Yong Yu
OffRL
98
4
0
11 Aug 2024
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Jacob K Christopher
Brian Bartoldson
Tal Ben-Nun
Michael Cardei
B. Kailkhura
Ferdinando Fioretto
DiffM
66
3
0
10 Aug 2024
CREST: Effectively Compacting a Datastore For Retrieval-Based
  Speculative Decoding
CREST: Effectively Compacting a Datastore For Retrieval-Based Speculative Decoding
Sophia Ho
Jinsol Park
Patrick Wang
34
0
0
08 Aug 2024
Clover-2: Accurate Inference for Regressive Lightweight Speculative
  Decoding
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
Bin Xiao
Lujun Gui
Lei Su
Weipeng Chen
31
3
0
01 Aug 2024
Inference acceleration for large language models using "stairs" assisted
  greedy generation
Inference acceleration for large language models using "stairs" assisted greedy generation
Domas Grigaliunas
M. Lukoševičius
26
0
0
29 Jul 2024
Graph-Structured Speculative Decoding
Graph-Structured Speculative Decoding
Zhuocheng Gong
Jiahao Liu
Ziyue Wang
Pengfei Wu
Jingang Wang
Xunliang Cai
Dongyan Zhao
Rui Yan
31
3
0
23 Jul 2024
Attention Is All You Need But You Don't Need All Of It For Inference of
  Large Language Models
Attention Is All You Need But You Don't Need All Of It For Inference of Large Language Models
Georgy Tyukin
G. Dovonon
Jean Kaddour
Pasquale Minervini
LRM
40
0
0
22 Jul 2024
Longhorn: State Space Models are Amortized Online Learners
Longhorn: State Space Models are Amortized Online Learners
Bo Liu
Rui Wang
Lemeng Wu
Yihao Feng
Peter Stone
Qian Liu
53
11
0
19 Jul 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
53
1
0
17 Jul 2024
Inference Optimization of Foundation Models on AI Accelerators
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
V. Cevher
Yida Wang
George Karypis
45
3
0
12 Jul 2024
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Zifeng Wang
Long Le
Huaixiu Steven Zheng
Swaroop Mishra
...
Anush Mattapalli
Ankur Taly
Jingbo Shang
Chen-Yu Lee
Tomas Pfister
RALM
85
34
0
11 Jul 2024
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of
  Language Models
Speculative Speech Recognition by Audio-Prefixed Low-Rank Adaptation of Language Models
Bolaji Yusuf
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
45
1
0
05 Jul 2024
Let the Code LLM Edit Itself When You Edit the Code
Let the Code LLM Edit Itself When You Edit the Code
Zhenyu He
Jun Zhang
Shengjie Luo
Jingjing Xu
Z. Zhang
Di He
KELM
39
0
0
03 Jul 2024
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested
  Large Language Models
S2D: Sorted Speculative Decoding For More Efficient Deployment of Nested Large Language Models
Parsa Kavehzadeh
Mohammadreza Pourreza
Mojtaba Valipour
Tinashu Zhu
Haoli Bai
Ali Ghodsi
Boxing Chen
Mehdi Rezagholizadeh
32
0
0
02 Jul 2024
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Adaptive Draft-Verification for Efficient Large Language Model Decoding
Xukun Liu
Bowen Lei
Ruqi Zhang
Dongkuan Xu
39
3
0
27 Jun 2024
Previous
1234567
Next