Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.10427
Cited By
Accelerating Transformer Inference for Translation via Parallel Decoding
17 May 2023
Andrea Santilli
Silvio Severino
Emilian Postolache
Valentino Maiorca
Michele Mancusi
R. Marin
Emanuele Rodolà
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accelerating Transformer Inference for Translation via Parallel Decoding"
26 / 26 papers shown
Title
S
4
^4
4
C: Speculative Sampling with Syntactic and Semantic Coherence for Efficient Inference of Large Language Models
Tao He
Guang Huang
Yu Yang
Tianshi Xu
Sicheng Zhao
Guiguang Ding
Pengyang Wang
Feng Tian
LRM
28
0
0
17 Jun 2025
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Yuxin Huang
Han Zhao
Donglin Wang
Haoang Li
18
0
0
16 Jun 2025
Think Before You Accept: Semantic Reflective Verification for Faster Speculative Decoding
Yixuan Wang
Yijun Liu
Shiyu Ji
Yuzhuang Xu
Yang Xu
Qingfu Zhu
Wanxiang Che
OffRL
LRM
54
0
0
24 May 2025
Accelerate TarFlow Sampling with GS-Jacobi Iteration
Ben Liu
Zhen Qin
87
0
0
19 May 2025
FastVAR: Linear Visual Autoregressive Modeling via Cached Token Pruning
Hang Guo
Yawei Li
Taolin Zhang
Jiadong Wang
Tao Dai
Shu-Tao Xia
Luca Benini
162
5
0
30 Mar 2025
Accelerate Parallelizable Reasoning via Parallel Decoding within One Sequence
Yijiong Yu
LRM
AIMat
177
1
0
26 Mar 2025
Collaborative Speculative Inference for Efficient LLM Inference Serving
Luyao Gao
Jianchun Liu
Hongli Xu
Xichong Zhang
Yunming Liao
Liusheng Huang
110
1
0
13 Mar 2025
Block Diffusion: Interpolating Between Autoregressive and Diffusion Language Models
Marianne Arriola
Aaron Gokaslan
Justin T Chiu
Zhihan Yang
Zhixuan Qi
Jiaqi Han
Subham Sekhar Sahoo
Volodymyr Kuleshov
DiffM
277
25
0
12 Mar 2025
DiffPO: Diffusion-styled Preference Optimization for Efficient Inference-Time Alignment of Large Language Models
Ruizhe Chen
Wenhao Chai
Zhifei Yang
Xiaotian Zhang
Qiufeng Wang
Tony Q.S. Quek
Soujanya Poria
Zuozhu Liu
133
1
0
06 Mar 2025
Fuzzy Speculative Decoding for a Tunable Accuracy-Runtime Tradeoff
Maximilian Holsman
Yukun Huang
Bhuwan Dhingra
128
0
0
28 Feb 2025
Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding
Tian Jin
Ellie Y. Cheng
Zack Ankner
Nikunj Saunshi
Blake M. Elias
Amir Yazdanbakhsh
Jonathan Ragan-Kelley
Suvinay Subramanian
Michael Carbin
141
5
0
24 Feb 2025
KVLink: Accelerating Large Language Models via Efficient KV Cache Reuse
Jingbo Yang
Bairu Hou
Wei Wei
Yujia Bao
Shiyu Chang
VLM
188
3
0
21 Feb 2025
Judge Decoding: Faster Speculative Sampling Requires Going Beyond Model Alignment
Gregor Bachmann
Sotiris Anagnostidis
Albert Pumarola
Markos Georgopoulos
A. Sanakoyeu
Yuming Du
Edgar Schönfeld
Ali K. Thabet
Jonas Kohler
ALM
BDL
162
16
0
31 Jan 2025
From Cool Demos to Production-Ready FMware: Core Challenges and a Technology Roadmap
Gopi Krishnan Rajbahadur
G. Oliva
Dayi Lin
Ahmed E. Hassan
120
1
0
28 Jan 2025
Towards Scalable and Stable Parallelization of Nonlinear RNNs
Xavier Gonzalez
Andrew Warrington
Jimmy T.H. Smith
Scott W. Linderman
290
11
0
17 Jan 2025
Faster Language Models with Better Multi-Token Prediction Using Tensor Decomposition
Artem Basharin
Andrei Chertkov
Ivan Oseledets
149
1
0
23 Oct 2024
SWIFT: On-the-Fly Self-Speculative Decoding for LLM Inference Acceleration
Heming Xia
Yongqi Li
Jun Zhang
Cunxiao Du
Wenjie Li
LRM
140
13
0
09 Oct 2024
Closed-Loop Long-Horizon Robotic Planning via Equilibrium Sequence Modeling
Jinghan Li
Zhicheng Sun
Fei Li
243
2
0
02 Oct 2024
Accelerating Auto-regressive Text-to-Image Generation with Training-free Speculative Jacobi Decoding
Yao Teng
Han Shi
Xian Liu
Xuefei Ning
Guohao Dai
Yu Wang
Zhenguo Li
Xihui Liu
124
16
0
02 Oct 2024
Efficiency Unleashed: Inference Acceleration for LLM-based Recommender Systems with Speculative Decoding
Yunjia Xi
Hangyu Wang
Bo Chen
Jianghao Lin
Menghui Zhu
Wen Liu
Ruiming Tang
Zhewei Wei
Weinan Zhang
Yong Yu
OffRL
160
4
0
11 Aug 2024
Inference Optimization of Foundation Models on AI Accelerators
Youngsuk Park
Kailash Budhathoki
Liangfu Chen
Jonas M. Kübler
Jiaji Huang
Matthäus Kleindessner
Jun Huan
Volkan Cevher
Yida Wang
George Karypis
122
5
0
12 Jul 2024
S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs
Wei Zhong
Manasa Bharadwaj
115
7
0
30 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Aman Chadha
OffRL
125
5
0
15 May 2024
Decoding Speculative Decoding
Minghao Yan
Saurabh Agarwal
Shivaram Venkataraman
LRM
92
10
0
02 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
152
165
0
26 Jan 2024
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding
Heming Xia
Zhe Yang
Qingxiu Dong
Peiyi Wang
Chak Tou Leong
Tao Ge
Tianyu Liu
Wenjie Li
Zhifang Sui
LRM
160
130
0
15 Jan 2024
1