Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.04487
Cited By
Inference with Reference: Lossless Acceleration of Large Language Models
10 April 2023
Nan Yang
Tao Ge
Liang Wang
Binxing Jiao
Daxin Jiang
Linjun Yang
Rangan Majumder
Furu Wei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Inference with Reference: Lossless Acceleration of Large Language Models"
41 / 41 papers shown
Title
MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models
Mugilan Ganesan
Shri Kiran Srinivasan
Ankur Aggarwal
Nish Sinnadurai
Sean Lie
Vithursan Thangarasa
VLM
34
0
0
15 May 2025
Speculative Decoding and Beyond: An In-Depth Survey of Techniques
Y. Hu
Zining Liu
Zhenyuan Dong
Tianfan Peng
Bradley McDanel
Shanghang Zhang
93
0
0
27 Feb 2025
Towards Optimal Multi-draft Speculative Decoding
Zhibo Hu
Tong Zheng
Vignesh Viswanathan
Ziyi Chen
Ryan Rossi
Yihan Wu
Dinesh Manocha
Heng Huang
47
4
0
26 Feb 2025
Deploying Foundation Model Powered Agent Services: A Survey
Wenchao Xu
Jinyu Chen
Peirong Zheng
Xiaoquan Yi
Tianyi Tian
...
Quan Wan
Yining Qi
Yunfeng Fan
Qinliang Su
Xuemin Shen
AI4CE
119
1
0
18 Dec 2024
SSSD: Simply-Scalable Speculative Decoding
Michele Marzollo
Jiawei Zhuang
Niklas Roemer
Lorenz K. Müller
Lukas Cavigelli
LRM
52
2
0
08 Nov 2024
SuffixDecoding: A Model-Free Approach to Speeding Up Large Language Model Inference
Gabriele Oliaro
Zhihao Jia
Daniel F Campos
Aurick Qiao
LRM
44
4
0
07 Nov 2024
CursorCore: Assist Programming through Aligning Anything
Hao Jiang
Qi Liu
Rui Li
Shengyu Ye
Shijin Wang
55
1
0
09 Oct 2024
ParallelSpec: Parallel Drafter for Efficient Speculative Decoding
Zilin Xiao
Hongming Zhang
Tao Ge
Siru Ouyang
Vicente Ordonez
Dong Yu
52
5
0
08 Oct 2024
Large Language Model Inference Acceleration: A Comprehensive Hardware Perspective
Jinhao Li
Jiaming Xu
Shan Huang
Yonghua Chen
Wen Li
...
Jiayi Pan
Li Ding
Hao Zhou
Yu Wang
Guohao Dai
62
17
0
06 Oct 2024
Hardware Acceleration of LLMs: A comprehensive survey and comparison
Nikoletta Koilia
C. Kachris
60
5
0
05 Sep 2024
Clover-2: Accurate Inference for Regressive Lightweight Speculative Decoding
Bin Xiao
Lujun Gui
Lei Su
Weipeng Chen
39
3
0
01 Aug 2024
EAGLE-2: Faster Inference of Language Models with Dynamic Draft Trees
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
96
57
0
24 Jun 2024
From Decoding to Meta-Generation: Inference-time Algorithms for Large Language Models
Sean Welleck
Amanda Bertsch
Matthew Finlayson
Hailey Schoelkopf
Alex Xie
Graham Neubig
Ilia Kulikov
Zaid Harchaoui
38
51
0
24 Jun 2024
SUBLLM: A Novel Efficient Architecture with Token Sequence Subsampling for LLM
Quandong Wang
Yuxuan Yuan
Xiaoyu Yang
Ruike Zhang
Kang Zhao
Wei Liu
Jian Luan
Daniel Povey
Bin Wang
55
0
0
03 Jun 2024
SpecDec++: Boosting Speculative Decoding via Adaptive Candidate Lengths
Kaixuan Huang
Xudong Guo
Mengdi Wang
52
20
0
30 May 2024
Nearest Neighbor Speculative Decoding for LLM Generation and Attribution
Minghan Li
Xilun Chen
Ari Holtzman
Beidi Chen
Jimmy Lin
Wen-tau Yih
Xi Lin
RALM
BDL
110
10
0
29 May 2024
A Comparative Analysis of Distributed Training Strategies for GPT-2
Ishan Patwardhan
Shubham Gandhi
Om M. Khare
Amit Joshi
Suraj Sawant
42
1
0
24 May 2024
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Aman Chadha
OffRL
42
1
0
15 May 2024
Clover: Regressive Lightweight Speculative Decoding with Sequential Knowledge
Bin Xiao
Chunan Shi
Xiaonan Nie
Fan Yang
Xiangwei Deng
Lei Su
Weipeng Chen
Bin Cui
42
8
0
01 May 2024
Kangaroo: Lossless Self-Speculative Decoding via Double Early Exiting
Fangcheng Liu
Yehui Tang
Zhenhua Liu
Yunsheng Ni
Kai Han
Yunhe Wang
51
24
0
29 Apr 2024
Recurrent Drafter for Fast Speculative Decoding in Large Language Models
Aonan Zhang
Chong-Jun Wang
Yi Wang
Xuanyu Zhang
Yunfei Cheng
42
17
0
14 Mar 2024
LLM Inference Unveiled: Survey and Roofline Model Insights
Zhihang Yuan
Yuzhang Shang
Yang Zhou
Zhen Dong
Zhe Zhou
...
Yong Jae Lee
Yan Yan
Beidi Chen
Guangyu Sun
Kurt Keutzer
61
82
0
26 Feb 2024
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition
Jinghui Lu
Ziwei Yang
Yanjie Wang
Xuejing Liu
Brian Mac Namee
Can Huang
MoE
58
5
0
07 Feb 2024
A Survey on Transformer Compression
Yehui Tang
Yunhe Wang
Jianyuan Guo
Zhijun Tu
Kai Han
Hailin Hu
Dacheng Tao
46
30
0
05 Feb 2024
GliDe with a CaPE: A Low-Hassle Method to Accelerate Speculative Decoding
Cunxiao Du
Jing Jiang
Yuanchen Xu
Jiawei Wu
Sicheng Yu
...
Shenggui Li
Kai Xu
Liqiang Nie
Zhaopeng Tu
Yang You
42
30
0
03 Feb 2024
Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
Yichao Fu
Peter Bailis
Ion Stoica
Hao Zhang
133
145
0
03 Feb 2024
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
52
128
0
26 Jan 2024
Accelerating Retrieval-Augmented Language Model Serving with Speculation
Zhihao Zhang
Alan Zhu
Lijie Yang
Yihua Xu
Lanting Li
P. Phothilimthana
Zhihao Jia
RALM
KELM
56
16
0
25 Jan 2024
MambaByte: Token-free Selective State Space Model
Junxiong Wang
Tushaar Gangavarapu
Jing Nathan Yan
Alexander M. Rush
Mamba
46
37
0
24 Jan 2024
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
70
12
0
20 Dec 2023
Efficient Deep Speech Understanding at the Edge
Rongxiang Wang
Felix Lin
21
2
0
22 Nov 2023
Improving Machine Translation with Large Language Models: A Preliminary Study with Cooperative Decoding
Jiali Zeng
Fandong Meng
Yongjing Yin
Jie Zhou
35
11
0
06 Nov 2023
SpecTr: Fast Speculative Decoding via Optimal Transport
Ziteng Sun
A. Suresh
Jae Hun Ro
Ahmad Beirami
Himanshu Jain
Felix X. Yu
55
68
0
23 Oct 2023
Large Search Model: Redefining Search Stack in the Era of LLMs
Liang Wang
Nan Yang
Xiaolong Huang
Linjun Yang
Rangan Majumder
Furu Wei
LRM
KELM
50
13
0
23 Oct 2023
LLMLingua: Compressing Prompts for Accelerated Inference of Large Language Models
Huiqiang Jiang
Qianhui Wu
Chin-Yew Lin
Yuqing Yang
Lili Qiu
53
104
0
09 Oct 2023
SCALE: Synergized Collaboration of Asymmetric Language Translation Engines
Xin Cheng
Xun Wang
Tao Ge
Si-Qing Chen
Heng Chang
Dongyan Zhao
Rui Yan
69
2
0
29 Sep 2023
Large Language Models and Knowledge Graphs: Opportunities and Challenges
Jeff Z. Pan
Simon Razniewski
Jan-Christoph Kalo
Sneha Singhania
Jiaoyan Chen
...
Gerard de Melo
A. Bonifati
Edlira Vakaj
M. Dragoni
D. Graux
KELM
35
73
0
11 Aug 2023
Copy Is All You Need
Tian Lan
Deng Cai
Yan Wang
Heyan Huang
Xian-Ling Mao
35
27
0
13 Jul 2023
SpecInfer: Accelerating Generative Large Language Model Serving with Tree-based Speculative Inference and Verification
Xupeng Miao
Gabriele Oliaro
Zhihao Zhang
Xinhao Cheng
Zeyu Wang
...
Chunan Shi
Zhuoming Chen
Daiyaan Arfeen
Reyna Abhyankar
Zhihao Jia
LRM
68
122
0
16 May 2023
Lossless Acceleration for Seq2seq Generation with Aggressive Decoding
Tao Ge
Heming Xia
Xin Sun
Si-Qing Chen
Furu Wei
85
18
0
20 May 2022
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
234
198
0
07 Feb 2020
1