ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.01318
  4. Cited By
Accelerating Large Language Model Decoding with Speculative Sampling

Accelerating Large Language Model Decoding with Speculative Sampling

2 February 2023
Charlie Chen
Sebastian Borgeaud
G. Irving
Jean-Baptiste Lespiau
Laurent Sifre
J. Jumper
    BDL
    LRM
ArXivPDFHTML

Papers citing "Accelerating Large Language Model Decoding with Speculative Sampling"

50 / 316 papers shown
Title
SSR: Speculative Parallel Scaling Reasoning in Test-time
SSR: Speculative Parallel Scaling Reasoning in Test-time
Yuanlin Chu
Bo Wang
Xiang Liu
Hong Chen
Aiwei Liu
Xuming Hu
ReLM
LRM
16
0
0
21 May 2025
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
Accelerating Autoregressive Speech Synthesis Inference With Speech Speculative Decoding
Zijian Lin
Yang Zhang
Yougen Yuan
Yuming Yan
Jinjiang Liu
Zhiyong Wu
Pengfei Hu
Qun Yu
15
0
0
21 May 2025
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
BanditSpec: Adaptive Speculative Decoding via Bandit Algorithms
Yunlong Hou
Fengzhuo Zhang
Cunxiao Du
Xuan Zhang
Jiachun Pan
Tianyu Pang
Chao Du
Vincent Y. F. Tan
Zhuoran Yang
OffRL
12
0
0
21 May 2025
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
FLASH: Latent-Aware Semi-Autoregressive Speculative Decoding for Multimodal Tasks
Zihua Wang
Ruibo Li
Haozhe Du
Joey Tianyi Zhou
Yu Zhang
Xu Yang
MLLM
17
0
0
19 May 2025
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Accelerating Adaptive Retrieval Augmented Generation via Instruction-Driven Representation Reduction of Retrieval Overlaps
Jie Ou
Jinyu Guo
Shuaihong Jiang
Zhaokun Wang
Libo Qin
Shunyu Yao
Wenhong Tian
3DV
22
0
0
19 May 2025
Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission
Communication-Efficient Hybrid Language Model via Uncertainty-Aware Opportunistic and Compressed Transmission
Seungeun Oh
Jinhyuk Kim
Jihong Park
Seung-Woo Ko
Jinho Choi
Tony Q. S. Quek
Seong-Lyun Kim
19
0
0
17 May 2025
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Shri Kiran Srinivasan
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
27
0
0
15 May 2025
CEC-Zero: Chinese Error Correction Solution Based on LLM
CEC-Zero: Chinese Error Correction Solution Based on LLM
Sophie Zhang
Zhiming Lin
26
0
0
14 May 2025
Automatic Task Detection and Heterogeneous LLM Speculative Decoding
Automatic Task Detection and Heterogeneous LLM Speculative Decoding
Danying Ge
Jianhua Gao
Qizhi Jiang
Yifei Feng
Weixing Ji
44
0
0
13 May 2025
Scaling Laws for Speculative Decoding
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
48
0
0
08 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
34
0
0
08 May 2025
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
Diffusion Models are Secretly Exchangeable: Parallelizing DDPMs via Autospeculation
Hengyuan Hu
Aniket Das
Dorsa Sadigh
Nima Anari
DiffM
28
0
0
06 May 2025
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
RetroInfer: A Vector-Storage Approach for Scalable Long-Context LLM Inference
Yushen Chen
Jiawei Zhang
Baotong Lu
Qianxi Zhang
Chengruidong Zhang
...
Chen Chen
Mingxing Zhang
Yuqing Yang
Fan Yang
Mao Yang
38
0
0
05 May 2025
Accelerating Large Language Model Reasoning via Speculative Search
Accelerating Large Language Model Reasoning via Speculative Search
Zhihai Wang
Jie Wang
Jilai Pan
Xilin Xia
Huiling Zhen
M. Yuan
Jianye Hao
Feng Wu
ReLM
LRM
75
1
0
03 May 2025
Bi-directional Model Cascading with Proxy Confidence
Bi-directional Model Cascading with Proxy Confidence
David Warren
Mark Dras
51
0
0
27 Apr 2025
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
J. Li
Lijun Wu
Mengdi Zhang
LLMAG
LRM
69
2
0
27 Apr 2025
Energy Considerations of Large Language Model Inference and Efficiency Optimizations
Energy Considerations of Large Language Model Inference and Efficiency Optimizations
Jared Fernandez
Clara Na
Vashisth Tiwari
Yonatan Bisk
Sasha Luccioni
Emma Strubell
46
0
0
24 Apr 2025
SplitReason: Learning To Offload Reasoning
SplitReason: Learning To Offload Reasoning
Yash Akhauri
Anthony Fei
Chi-chih Chang
Ahmed F. AbouElhamayed
Yueying Li
Mohamed S. Abdelfattah
OffRL
ReLM
LRM
51
0
0
23 Apr 2025
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
PARD: Accelerating LLM Inference with Low-Cost PARallel Draft Model Adaptation
Zihao An
Huajun Bai
Zichen Liu
Dong Li
E. Barsoum
61
0
0
23 Apr 2025
Context-Enhanced Contrastive Search for Improved LLM Text Generation
Context-Enhanced Contrastive Search for Improved LLM Text Generation
Jaydip Sen
Rohit Pandey
Hetvi Waghela
55
0
0
22 Apr 2025
Speculative Sampling via Exponential Races
Speculative Sampling via Exponential Races
Szymon Kobus
Deniz Gündüz
LRM
35
0
0
21 Apr 2025
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Collaborative Learning of On-Device Small Model and Cloud-Based Large Model: Advances and Future Directions
Chaoyue Niu
Yucheng Ding
Junhui Lu
Zhengxiang Huang
Hang Zeng
Yutong Dai
Xuezhen Tu
Chengfei Lv
Fan Wu
Guihai Chen
35
1
0
17 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xueliang Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
70
6
0
15 Apr 2025
Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance
Efficient Distributed Retrieval-Augmented Generation for Enhancing Language Model Performance
Shixuan Liu
Zhenzhe Zheng
Xiaoyao Huang
Fan Wu
Guihai Chen
Jie Wu
35
0
0
15 Apr 2025
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices
Jupiter: Fast and Resource-Efficient Collaborative Inference of Generative LLMs on Edge Devices
Shengyuan Ye
Bei Ouyang
Liekang Zeng
Tianyi Qian
Xiaowen Chu
Jian Tang
Xu Chen
34
1
0
11 Apr 2025
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
SpecEE: Accelerating Large Language Model Inference with Speculative Early Exiting
Jiaming Xu
Jiayi Pan
Yongkang Zhou
Siming Chen
Jiyang Li
Yaoxiu Lian
Junyi Wu
Guohao Dai
LRM
40
0
0
11 Apr 2025
SD$^2$: Self-Distilled Sparse Drafters
SD2^22: Self-Distilled Sparse Drafters
Mike Lasby
Nish Sinnadurai
Valavan Manohararajah
Sean Lie
Vithursan Thangarasa
187
1
0
10 Apr 2025
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
DEL: Context-Aware Dynamic Exit Layer for Efficient Self-Speculative Decoding
Hossein Entezari Zarch
Lei Gao
Chaoyi Jiang
Murali Annavaram
LRM
33
0
0
08 Apr 2025
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
SPIRe: Boosting LLM Inference Throughput with Speculative Decoding
Sanjit Neelam
Daniel Heinlein
Vaclav Cvicek
Akshay Mishra
Reiner Pope
LRM
43
0
0
08 Apr 2025
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Scaling Test-Time Inference with Policy-Optimized, Dynamic Retrieval-Augmented Generation via KV Caching and Decoding
Sakhinana Sagar Srinivas
Akash Das
Shivam Gupta
Venkataramana Runkana
OffRL
58
1
0
02 Apr 2025
Collaborative LLM Numerical Reasoning with Local Data Protection
Collaborative LLM Numerical Reasoning with Local Data Protection
Min Zhang
Yuzhe Lu
Yun Zhou
Panpan Xu
Lin Lee Cheong
Chang-Tien Lu
Haozhu Wang
55
0
0
01 Apr 2025
Model Hemorrhage and the Robustness Limits of Large Language Models
Model Hemorrhage and the Robustness Limits of Large Language Models
Ziyang Ma
Zhiyu Li
Lefei Zhang
Gui-Song Xia
Bo Du
Liangpei Zhang
Dacheng Tao
62
0
0
31 Mar 2025
Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding
Token-Driven GammaTune: Adaptive Calibration for Enhanced Speculative Decoding
Aayush Gautam
Susav Shrestha
Narasimha Annapareddy
56
0
0
28 Mar 2025
Speculative Decoding for Verilog: Speed and Quality, All in One
Speculative Decoding for Verilog: Speed and Quality, All in One
Changran Xu
Yi Liu
Yunhao Zhou
Shan Huang
Ningyi Xu
Qiang Xu
53
0
0
18 Mar 2025
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
ML-SpecQD: Multi-Level Speculative Decoding with Quantized Drafts
E. Georganas
Dhiraj D. Kalamkar
Alexander Kozlov
A. Heinecke
MQ
210
0
0
17 Mar 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More
Arvid Frydenlund
LRM
63
0
0
13 Mar 2025
Gumiho: A Hybrid Architecture to Prioritize Early Tokens in Speculative Decoding
Jiajun Li
Yixing Xu
Haiduo Huang
Xuanwu Yin
D. Li
Edith C. -H. Ngai
E. Barsoum
61
0
0
13 Mar 2025
G-Boost: Boosting Private SLMs with General LLMs
Yijiang Fan
Yuren Mao
Longbin Lai
Ying Zhang
Zhengping Qian
Yunjun Gao
46
0
0
13 Mar 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
46
0
0
13 Mar 2025
Position-Aware Depth Decay Decoding (D3D^3D3): Boosting Large Language Model Inference Efficiency
Siqi Fan
Xuezhi Fang
Xingrun Xing
Peng Han
Shuo Shang
Yequan Wang
68
0
0
11 Mar 2025
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Training Domain Draft Models for Speculative Decoding: Best Practices and Insights
Fenglu Hong
Ravi Raju
Jonathan Li
Bo Li
Urmish Thakker
Avinash Ravichandran
Swayambhoo Jain
Changran Hu
48
0
0
10 Mar 2025
DistiLLM-2: A Contrastive Approach Boosts the Distillation of LLMs
Jongwoo Ko
Tianyi Chen
Sungnyun Kim
Tianyu Ding
Luming Liang
Ilya Zharkov
Se-Young Yun
VLM
225
0
0
10 Mar 2025
Queueing, Predictions, and LLMs: Challenges and Open Problems
Michael Mitzenmacher
Rana Shahout
AI4TS
LRM
44
1
0
10 Mar 2025
Speculative Decoding for Multi-Sample Inference
Yiwei Li
Jiayi Shi
Shaoxiong Feng
Peiwen Yuan
Xueliang Wang
...
Ji Zhang
Chuyi Tan
Boyuan Pan
Yao Hu
Kan Li
LRM
49
0
0
07 Mar 2025
SpecServe: Efficient and SLO-Aware Large Language Model Serving with Adaptive Speculative Decoding
Kaiyu Huang
Yu Wang
Zhubo Shi
Han Zou
Minchen Yu
Qingjiang Shi
LRM
49
2
0
07 Mar 2025
Balcony: A Lightweight Approach to Dynamic Inference of Generative Language Models
Benyamin Jamialahmadi
Parsa Kavehzadeh
Mehdi Rezagholizadeh
Parsa Farinneya
Hossein Rajabzadeh
A. Jafari
Boxing Chen
Marzieh S. Tahaei
52
0
0
06 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
52
0
0
06 Mar 2025
RASD: Retrieval-Augmented Speculative Decoding
Guofeng Quan
Wenfeng Feng
Chuzhan Hao
Guochao Jiang
Yuewei Zhang
Hao Wang
RALM
85
1
0
05 Mar 2025
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
EAGLE-3: Scaling up Inference Acceleration of Large Language Models via Training-Time Test
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
123
6
0
03 Mar 2025
DuoDecoding: Hardware-aware Heterogeneous Speculative Decoding with Dynamic Multi-Sequence Drafting
Kai Lv
Honglin Guo
Qipeng Guo
Xipeng Qiu
41
0
0
02 Mar 2025
1234567
Next