Title
Deep Batch Active Learning by Diverse, Uncertain Gradient Lower Bounds Jordan T. Ash Chicheng Zhang A. Krishnamurthy John Langford Alekh Agarwal BDL UQCV 88 776 0 09 Jun 2019
Energy and Policy Considerations for Deep Learning in NLP Emma Strubell Ananya Ganesh Andrew McCallum 73 2,660 0 05 Jun 2019
Efficient 8-Bit Quantization of Transformer Neural Machine Language Translation Model Aishwarya Bhandare Vamsi Sripathi Deepthi Karkada Vivek V. Menon Sun Choi Kushal Datta V. Saletore MQ 69 132 0 03 Jun 2019
A Study of BFLOAT16 for Deep Learning Training Dhiraj D. Kalamkar Dheevatsa Mudigere Naveen Mellempudi Dipankar Das K. Banerjee ... Sudarshan Srinivasan Abhisek Kundu M. Smelyanskiy Bharat Kaul Pradeep Dubey MQ 83 347 0 29 May 2019
Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives Yi Tay Shuohang Wang Anh Tuan Luu Jie Fu Minh C. Phan Xingdi Yuan J. Rao S. Hui Aston Zhang 88 110 0 26 May 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 103 1,068 0 25 May 2019
Analyzing Multi-Head Self-Attention: Specialized Heads Do the Heavy Lifting, the Rest Can Be Pruned Elena Voita David Talbot F. Moiseev Rico Sennrich Ivan Titov 114 1,146 0 23 May 2019
Curriculum Learning for Domain Adaptation in Neural Machine Translation Xuan Zhang Pamela Shapiro Manish Kumar Paul McNamee Marine Carpuat Kevin Duh 64 124 0 14 May 2019
Sparse Sequence-to-Sequence Models Ben Peters Vlad Niculae André F. T. Martins TPM 177 213 0 14 May 2019
Generating Long Sequences with Sparse Transformers R. Child Scott Gray Alec Radford Ilya Sutskever 129 1,908 0 23 Apr 2019
Competence-based Curriculum Learning for Neural Machine Translation Emmanouil Antonios Platanios Otilia Stretcu Graham Neubig Barnabás Póczós Tom Michael Mitchell 89 344 0 23 Mar 2019
The State of Sparsity in Deep Neural Networks Trevor Gale Erich Elsen Sara Hooker 161 761 0 25 Feb 2019
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization Hesham Mostafa Xin Wang 79 314 0 15 Feb 2019
Parameter-Efficient Transfer Learning for NLP N. Houlsby A. Giurgiu Stanislaw Jastrzebski Bruna Morrone Quentin de Laroussilhe Andrea Gesmundo Mona Attariyan Sylvain Gelly 217 4,499 0 02 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 253 3,745 0 09 Jan 2019
Rethinking ImageNet Pre-training Kaiming He Ross B. Girshick Piotr Dollár VLM SSeg 130 1,086 0 21 Nov 2018
A System for Massively Parallel Hyperparameter Tuning Liam Li Kevin Jamieson Afshin Rostamizadeh Ekaterina Gonina Moritz Hardt Benjamin Recht Ameet Talwalkar 68 386 0 13 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,114 0 11 Oct 2018
SWAG: A Large-Scale Adversarial Dataset for Grounded Commonsense Inference Rowan Zellers Yonatan Bisk Roy Schwartz Yejin Choi 109 718 0 16 Aug 2018
Practical Obstacles to Deploying Active Learning David Lowell Zachary Chase Lipton Byron C. Wallace 84 111 0 12 Jul 2018
Universal Transformers Mostafa Dehghani Stephan Gouws Oriol Vinyals Jakob Uszkoreit Lukasz Kaiser 87 755 0 10 Jul 2018
Measuring the Intrinsic Dimension of Objective Landscapes Chunyuan Li Heerad Farkhoor Rosanne Liu J. Yosinski 86 414 0 24 Apr 2018
Pieces of Eight: 8-bit Neural Machine Translation Jerry Quinn Miguel Ballesteros MQ 53 25 0 13 Apr 2018
Datasheets for Datasets Timnit Gebru Jamie Morgenstern Briana Vecchione Jennifer Wortman Vaughan Hanna M. Wallach Hal Daumé Kate Crawford 266 2,194 0 23 Mar 2018
Self-Attention with Relative Position Representations Peter Shaw Jakob Uszkoreit Ashish Vaswani 177 2,295 0 06 Mar 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 227 11,565 0 15 Feb 2018
Learning Sparse Neural Networks through $L_0$ Regularization Christos Louizos Max Welling Diederik P. Kingma 433 1,147 0 04 Dec 2017
Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ... Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu 168 1,804 0 10 Oct 2017
Reporting Score Distributions Makes a Difference: Performance Study of LSTM-networks for Sequence Tagging Nils Reimers Iryna Gurevych 75 437 0 31 Jul 2017
An Overview of Multi-Task Learning in Deep Neural Networks Sebastian Ruder CVBM 156 2,830 0 15 Jun 2017
Learning multiple visual domains with residual adapters Sylvestre-Alvise Rebuffi Hakan Bilen Andrea Vedaldi OOD 173 937 0 22 May 2017
Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon Xin Luna Dong Shangyu Chen Sinno Jialin Pan 178 506 0 22 May 2017
Search Engine Guided Non-Parametric Neural Machine Translation Jiatao Gu Yong Wang Kyunghyun Cho Victor O.K. Li 55 49 0 20 May 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 1.2K 20,880 0 17 Apr 2017
Learning to Generate Reviews and Discovering Sentiment Alec Radford Rafal Jozefowicz Ilya Sutskever 97 510 0 05 Apr 2017
Deep Bayesian Active Learning with Image Data Y. Gal Riashat Islam Zoubin Ghahramani BDL UQCV 73 1,739 0 08 Mar 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer Noam M. Shazeer Azalia Mirhoseini Krzysztof Maziarz Andy Davis Quoc V. Le Geoffrey E. Hinton J. Dean MoE 251 2,683 0 23 Jan 2017
Sequence-Level Knowledge Distillation Yoon Kim Alexander M. Rush 122 1,120 0 25 Jun 2016
A large annotated corpus for learning natural language inference Samuel R. Bowman Gabor Angeli Christopher Potts Christopher D. Manning 321 4,293 0 21 Aug 2015
Learning both Weights and Connections for Efficient Neural Networks Song Han Jeff Pool J. Tran W. Dally CVBM 313 6,694 0 08 Jun 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 362 19,723 0 09 Mar 2015
Non-stochastic Best Arm Identification and Hyperparameter Optimization Kevin Jamieson Ameet Talwalkar 208 580 0 27 Feb 2015
Practical Bayesian Optimization of Machine Learning Algorithms Jasper Snoek Hugo Larochelle Ryan P. Adams 359 7,954 0 13 Jun 2012
Sample Selection Bias Correction Theory Corinna Cortes M. Mohri Michael Riley Afshin Rostamizadeh 103 350 0 19 May 2008