EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

v1v2v3 (latest)

EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty

26 January 2024

Hongyang R. Zhang

ArXiv (abs)PDF HTML

Papers citing "EAGLE: Speculative Sampling Requires Rethinking Feature Uncertainty"

12 / 162 papers shown

Title
Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding Xin Sun Tao Ge Furu Wei Houfeng Wang 101 64 0 09 Jun 2021
I-BERT: Integer-only BERT Quantization Sehoon Kim A. Gholami Z. Yao Michael W. Mahoney Kurt Keutzer MQ 179 354 0 05 Jan 2021
Movement Pruning: Adaptive Sparsity by Fine-Tuning Victor Sanh Thomas Wolf Alexander M. Rush 107 489 0 15 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference Ali Hadi Zadeh Isak Edo Omar Mohamed Awad Andreas Moshovos MQ 82 190 0 08 May 2020
Lite Transformer with Long-Short Range Attention Zhanghao Wu Zhijian Liu Ji Lin Chengyue Wu Song Han 62 323 0 24 Apr 2020
Fast Transformer Decoding: One Write-Head is All You Need Noam M. Shazeer 172 479 0 06 Nov 2019
Q8BERT: Quantized 8Bit BERT Ofir Zafrir Guy Boudoukh Peter Izsak Moshe Wasserblat MQ 112 507 0 14 Oct 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 159 1,153 0 23 May 2019
The State of Sparsity in Deep Neural Networks Trevor Gale Erich Elsen Sara Hooker 193 765 0 25 Feb 2019
Blockwise Parallel Decoding for Deep Autoregressive Models Mitchell Stern Noam M. Shazeer Ashley J. Llorens 86 238 0 07 Nov 2018
Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations Itay Hubara Matthieu Courbariaux Daniel Soudry Ran El-Yaniv Yoshua Bengio MQ 230 1,874 0 22 Sep 2016
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 369 19,808 0 09 Mar 2015