Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.21082
Cited By
Accelerating Large Language Model Inference with Self-Supervised Early Exits
30 July 2024
Florian Valade
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Accelerating Large Language Model Inference with Self-Supervised Early Exits"
9 / 9 papers shown
Title
From Words to Watts: Benchmarking the Energy Costs of Large Language Model Inference
S. Samsi
Dan Zhao
Joseph McDonald
Baolin Li
Adam Michaleas
Michael Jones
William Bergeron
J. Kepner
Devesh Tiwari
V. Gadepally
65
150
0
04 Oct 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
203
3,732
0
06 Dec 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
127
479
0
04 Jun 2022
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
140
1,601
0
20 Jan 2022
BinaryBERT: Pushing the Limit of BERT Quantization
Haoli Bai
Wei Zhang
Lu Hou
Lifeng Shang
Jing Jin
Xin Jiang
Qun Liu
Michael Lyu
Irwin King
MQ
214
227
0
31 Dec 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
47
342
0
07 Jun 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
115
817
0
06 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
84
360
0
05 Apr 2020
Reducing Transformer Depth on Demand with Structured Dropout
Angela Fan
Edouard Grave
Armand Joulin
120
596
0
25 Sep 2019
1