Title
Power Consumption Variation over Activation Functions Leon Derczynski 15 7 0 12 Jun 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 110 2,976 0 09 Jun 2020
HAT: Hardware-Aware Transformers for Efficient Natural Language Processing Hanrui Wang Zhanghao Wu Zhijian Liu Han Cai Ligeng Zhu Chuang Gan Song Han 88 262 0 28 May 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 838 42,332 0 28 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning Victor Sanh Thomas Wolf Alexander M. Rush 73 486 0 15 May 2020
Empowering Active Learning to Jointly Optimize System and User Demands Ji-Ung Lee Christian M. Meyer Iryna Gurevych 31 11 0 09 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference Ali Hadi Zadeh Isak Edo Omar Mohamed Awad Andreas Moshovos MQ 65 188 0 08 May 2020
Active Learning for Coreference Resolution using Discrete Annotation Belinda Z. Li Gabriel Stanovsky Luke Zettlemoyer 24 26 0 28 Apr 2020
DeeBERT: Dynamic Early Exiting for Accelerating BERT Inference Ji Xin Raphael Tang Jaejun Lee Yaoliang Yu Jimmy J. Lin 61 375 0 27 Apr 2020
Lite Transformer with Long-Short Range Attention Zhanghao Wu Zhijian Liu Ji Lin Chengyue Wu Song Han 60 322 0 24 Apr 2020
The Right Tool for the Job: Matching Model and Instance Complexities Roy Schwartz Gabriel Stanovsky Swabha Swayamdipta Jesse Dodge Noah A. Smith 107 169 0 16 Apr 2020
Training with Quantization Noise for Extreme Model Compression Angela Fan Pierre Stock Benjamin Graham Edouard Grave Remi Gribonval Hervé Jégou Armand Joulin MQ 93 245 0 15 Apr 2020
ProFormer: Towards On-Device LSH Projection Based Transformers Chinnadhurai Sankar Sujith Ravi Zornitsa Kozareva 44 9 0 13 Apr 2020
Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models Mingjun Zhao Haijiang Wu Di Niu Xiaoli Wang 65 42 0 13 Apr 2020
Longformer: The Long-Document Transformer Iz Beltagy Matthew E. Peters Arman Cohan RALM VLM 176 4,090 0 10 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models Hassan Sajjad Fahim Dalvi Nadir Durrani Preslav Nakov 67 141 0 08 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices Zhiqing Sun Hongkun Yu Xiaodan Song Renjie Liu Yiming Yang Denny Zhou MQ 109 817 0 06 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time Weijie Liu Peng Zhou Zhe Zhao Zhiruo Wang Haotang Deng Qi Ju 84 360 0 05 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers Aurko Roy M. Saffar Ashish Vaswani David Grangier MoE 324 601 0 12 Mar 2020
What is the State of Neural Network Pruning? Davis W. Blalock Jose Javier Gonzalez Ortiz Jonathan Frankle John Guttag 272 1,053 0 06 Mar 2020
A $^3$ : Accelerating Attention Mechanisms in Neural Networks with Approximation Tae Jun Ham Sungjun Jung Seonghak Kim Young H. Oh Yeonhong Park ... Jung-Hun Park Sanghee Lee Kyoung Park Jae W. Lee D. Jeong 63 219 0 22 Feb 2020
Balancing Cost and Benefit with Tied-Multi Transformers Raj Dabre Raphaël Rubino Atsushi Fujita 57 6 0 20 Feb 2020
Compressing BERT: Studying the Effects of Weight Pruning on Transfer Learning Mitchell A. Gordon Kevin Duh Nicholas Andrews VLM 59 342 0 19 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping Jesse Dodge Gabriel Ilharco Roy Schwartz Ali Farhadi Hannaneh Hajishirzi Noah A. Smith 99 597 0 15 Feb 2020
Adversarial Filters of Dataset Biases Ronan Le Bras Swabha Swayamdipta Chandra Bhagavatula Rowan Zellers Matthew E. Peters Ashish Sabharwal Yejin Choi 106 223 0 10 Feb 2020
Towards the Systematic Reporting of the Energy and Carbon Footprints of Machine Learning Peter Henderson Jie Hu Joshua Romoff Emma Brunskill Dan Jurafsky Joelle Pineau 89 456 0 31 Jan 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 608 4,893 0 23 Jan 2020
Reformer: The Efficient Transformer Nikita Kitaev Lukasz Kaiser Anselm Levskaya VLM 199 2,327 0 13 Jan 2020
Compressive Transformers for Long-Range Sequence Modelling Jack W. Rae Anna Potapenko Siddhant M. Jayakumar Timothy Lillicrap RALM VLM KELM 71 652 0 13 Nov 2019
Location Attention for Extrapolation to Longer Sequences Yann Dubois Gautier Dagan Dieuwke Hupkes Elia Bruni 51 43 0 10 Nov 2019
Generalization through Memorization: Nearest Neighbor Language Models Urvashi Khandelwal Omer Levy Dan Jurafsky Luke Zettlemoyer M. Lewis RALM 166 842 0 01 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension M. Lewis Yinhan Liu Naman Goyal Marjan Ghazvininejad Abdel-rahman Mohamed Omer Levy Veselin Stoyanov Luke Zettlemoyer AIMat VLM 263 10,848 0 29 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 450 20,298 0 23 Oct 2019
Depth-Adaptive Transformer Maha Elbayad Jiatao Gu Edouard Grave Michael Auli 86 193 0 22 Oct 2019
Quantifying the Carbon Emissions of Machine Learning Alexandre Lacoste A. Luccioni Victor Schmidt Thomas Dandres 104 708 0 21 Oct 2019
Fully Quantized Transformer for Machine Translation Gabriele Prato Ella Charlaix Mehdi Rezagholizadeh MQ 50 70 0 17 Oct 2019
Q8BERT: Quantized 8Bit BERT Ofir Zafrir Guy Boudoukh Peter Izsak Moshe Wasserblat MQ 83 505 0 14 Oct 2019
Structured Pruning of Large Language Models Ziheng Wang Jeremy Wohlwend Tao Lei 61 291 0 10 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 234 7,547 0 02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma Radu Soricut SSL AIMat 371 6,463 0 26 Sep 2019
Reducing Transformer Depth on Demand with Structured Dropout Angela Fan Edouard Grave Armand Joulin 120 596 0 25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 109 1,869 0 23 Sep 2019
Simple, Scalable Adaptation for Neural Machine Translation Ankur Bapna N. Arivazhagan Orhan Firat AI4CE 113 417 0 18 Sep 2019
Show Your Work: Improved Reporting of Experimental Results Jesse Dodge Suchin Gururangan Dallas Card Roy Schwartz Noah A. Smith 72 255 0 06 Sep 2019
Language Models as Knowledge Bases? Fabio Petroni Tim Rocktaschel Patrick Lewis A. Bakhtin Yuxiang Wu Alexander H. Miller Sebastian Riedel KELM AI4MH 576 2,673 0 03 Sep 2019
Adaptively Sparse Transformers Gonçalo M. Correia Vlad Niculae André F. T. Martins 87 256 0 30 Aug 2019
Revealing the Dark Secrets of BERT Olga Kovaleva Alexey Romanov Anna Rogers Anna Rumshisky 38 554 0 21 Aug 2019
Green AI Roy Schwartz Jesse Dodge Noah A. Smith Oren Etzioni 116 1,149 0 22 Jul 2019
Discriminative Active Learning Daniel Gissin Shai Shalev-Shwartz 53 178 0 15 Jul 2019
BatchBALD: Efficient and Diverse Batch Acquisition for Deep Bayesian Active Learning Andreas Kirsch Joost R. van Amersfoort Y. Gal FedML 87 629 0 19 Jun 2019