Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.13019
Cited By
A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models
15 May 2024
Mahsa Khoshnoodi
Vinija Jain
Mingye Gao
Malavika Srikanth
Aman Chadha
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey of Accelerated Generation Techniques in Large Language Models"
16 / 16 papers shown
Title
TriForce: Lossless Acceleration of Long Sequence Generation with Hierarchical Speculative Decoding
Hanshi Sun
Zhuoming Chen
Xinyu Yang
Yuandong Tian
Beidi Chen
62
55
0
18 Apr 2024
Medusa: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads
Tianle Cai
Yuhong Li
Zhengyang Geng
Hongwu Peng
Jason D. Lee
De-huai Chen
Tri Dao
82
275
0
19 Jan 2024
Lookahead: An Inference Acceleration Framework for Large Language Model with Lossless Generation Accuracy
Yao-Min Zhao
Zhitian Xie
Chen Liang
Chenyi Zhuang
Jinjie Gu
79
12
0
20 Dec 2023
Accelerating LLM Inference with Staged Speculative Decoding
Benjamin Spector
Christal Re
27
103
0
08 Aug 2023
Predictive Pipelined Decoding: A Compute-Latency Trade-off for Exact LLM Decoding
Seongjun Yang
Gibbeum Lee
Jaewoong Cho
Dimitris Papailiopoulos
Kangwook Lee
38
34
0
12 Jul 2023
SkipDecode: Autoregressive Skip Decoding with Batching and Caching for Efficient LLM Inference
Luciano Del Corro
Allison Del Giorno
Sahaj Agarwal
Ting Yu
Ahmed Hassan Awadallah
Subhabrata Mukherjee
52
57
0
05 Jul 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
176
4,085
0
09 Jun 2023
Fast Inference from Transformers via Speculative Decoding
Yaniv Leviathan
Matan Kalman
Yossi Matias
LRM
54
663
0
30 Nov 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
54
41
0
03 Mar 2022
A Gentle Introduction to Conformal Prediction and Distribution-Free Uncertainty Quantification
Anastasios Nikolas Angelopoulos
Stephen Bates
OOD
84
607
0
15 Jul 2021
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
446
2,051
0
28 Jul 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
66
356
0
05 Apr 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
66
7,386
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
30
1,838
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
270
1,861
0
17 Sep 2019
Fast Decoding in Sequence Models using Discrete Latent Variables
Łukasz Kaiser
Aurko Roy
Ashish Vaswani
Niki Parmar
Samy Bengio
Jakob Uszkoreit
Noam M. Shazeer
33
231
0
09 Mar 2018
1