ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.16487
  4. Cited By
Speculative Decoding: Exploiting Speculative Execution for Accelerating
  Seq2seq Generation

Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation

30 March 2022
Heming Xia
Tao Ge
Peiyi Wang
Si-Qing Chen
Furu Wei
Zhifang Sui
ArXivPDFHTML

Papers citing "Speculative Decoding: Exploiting Speculative Execution for Accelerating Seq2seq Generation"

21 / 21 papers shown
Title
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Efficient Reasoning for LLMs through Speculative Chain-of-Thought
Jikai Wang
J. Li
Lijun Wu
Hao Fei
LLMAG
LRM
69
2
0
27 Apr 2025
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Speculate, then Collaborate: Fusing Knowledge of Language Models during Decoding
Zihan Wang
Muneeza Azmart
Ang Li
R. Horesh
Mikhail Yurochkin
118
1
0
11 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
101
7
0
31 Jan 2025
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
38
1
0
13 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
53
6
0
09 Oct 2024
Efficient Inference for Large Language Model-based Generative Recommendation
Efficient Inference for Large Language Model-based Generative Recommendation
Xinyu Lin
Chaoqun Yang
Wenjie Wang
Yongqi Li
Cunxiao Du
Fuli Feng
See-Kiong Ng
Tat-Seng Chua
67
4
0
07 Oct 2024
Mixture of Attentions For Speculative Decoding
Mixture of Attentions For Speculative Decoding
Matthieu Zimmer
Milan Gritta
Gerasimos Lampouras
Haitham Bou Ammar
Jun Wang
76
4
0
04 Oct 2024
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
MagicDec: Breaking the Latency-Throughput Tradeoff for Long Context Generation with Speculative Decoding
Jian Chen
Vashisth Tiwari
Ranajoy Sadhukhan
Zhuoming Chen
Jinyuan Shi
Ian En-Hsu Yen
Ian En-Hsu Yen
Avner May
Tianqi Chen
Beidi Chen
LRM
39
22
0
20 Aug 2024
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Speculative Diffusion Decoding: Accelerating Language Generation through Diffusion
Jacob K Christopher
Brian Bartoldson
Tal Ben-Nun
Michael Cardei
B. Kailkhura
Ferdinando Fioretto
DiffM
56
3
0
10 Aug 2024
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting
Zilong Wang
Zifeng Wang
Long Le
Huaixiu Steven Zheng
Swaroop Mishra
...
Anush Mattapalli
Ankur Taly
Jingbo Shang
Chen-Yu Lee
Tomas Pfister
RALM
83
33
0
11 Jul 2024
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in
  the Era of Large Language Models
Merge, Ensemble, and Cooperate! A Survey on Collaborative Strategies in the Era of Large Language Models
Jinliang Lu
Ziliang Pang
Min Xiao
Yaochen Zhu
Rui Xia
Jiajun Zhang
MoMe
52
18
0
08 Jul 2024
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
OPT-Tree: Speculative Decoding with Adaptive Draft Tree Structure
Jikai Wang
Yi Su
Juntao Li
Qingrong Xia
Zi Ye
Xinyu Duan
Zhefeng Wang
Min Zhang
46
12
0
25 Jun 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for
  Low-Memory GPUs
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
47
5
0
30 May 2024
Decoding Speculative Decoding
Decoding Speculative Decoding
Minghao Yan
Saurabh Agarwal
Shivaram Venkataraman
LRM
35
6
0
02 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
44
123
0
26 Jan 2024
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language
  Models
BiTA: Bi-Directional Tuning for Lossless Acceleration in Large Language Models
Feng-Huei Lin
Hanling Yi
Hongbin Li
Yifan Yang
Xiaotian Yu
Guangming Lu
Rong Xiao
41
3
0
23 Jan 2024
A Survey on Hardware Accelerators for Large Language Models
A Survey on Hardware Accelerators for Large Language Models
C. Kachris
33
14
0
18 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive
  Survey of Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Yongqi Li
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
38
101
0
15 Jan 2024
Non-Autoregressive Neural Machine Translation: A Call for Clarity
Non-Autoregressive Neural Machine Translation: A Call for Clarity
Robin M. Schmidt
Telmo Pires
Stephan Peitz
J. Lööf
29
9
0
21 May 2022
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Tao Ge
Heming Xia
Xin Sun
Si-Qing Chen
Furu Wei
85
18
0
20 May 2022
Teaching Machines to Read and Comprehend
Teaching Machines to Read and Comprehend
Karl Moritz Hermann
Tomás Kociský
Edward Grefenstette
L. Espeholt
W. Kay
Mustafa Suleyman
Phil Blunsom
196
3,513
0
10 Jun 2015
1