Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15077
Cited By
v1
v2
v3 (latest)
EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty
26 January 2024
Yuhui Li
Fangyun Wei
Chao Zhang
Hongyang R. Zhang
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"
12 / 162 papers shown
Title
Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding
Xin Sun
Tao Ge
Furu Wei
Houfeng Wang
101
64
0
09 Jun 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
179
354
0
05 Jan 2021
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
107
489
0
15 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
82
190
0
08 May 2020
Lite Transformer with Long-Short Range Attention
Zhanghao Wu
Zhijian Liu
Ji Lin
Chengyue Wu
Song Han
62
323
0
24 Apr 2020
Fast Transformer Decoding: One Write-Head is All You Need
Noam M. Shazeer
172
479
0
06 Nov 2019
Q8BERT: Quantized 8Bit BERT
Ofir Zafrir
Guy Boudoukh
Peter Izsak
Moshe Wasserblat
MQ
112
507
0
14 Oct 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned
Elena Voita
David Talbot
F. Moiseev
Rico Sennrich
Ivan Titov
159
1,153
0
23 May 2019
The State of Sparsity in Deep Neural Networks
Trevor Gale
Erich Elsen
Sara Hooker
193
765
0
25 Feb 2019
Blockwise Parallel Decoding for Deep Autoregressive Models
Mitchell Stern
Noam M. Shazeer
Ashley J. Llorens
86
238
0
07 Nov 2018
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations
Itay Hubara
Matthieu Courbariaux
Daniel Soudry
Ran El-Yaniv
Yoshua Bengio
MQ
230
1,874
0
22 Sep 2016
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
369
19,808
0
09 Mar 2015
Previous
1
2
3
4