v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017

Samuel L. Smith

Pieter-Jan Kindermans

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown

Title
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks K. Chahal Manraj Singh Grover Kuntal Dey 3DH OOD 92 54 0 28 Oct 2018
Applying Deep Learning To Airbnb Search Malay Haldar Mustafa Abdool Prashant Ramanathan Tao Xu Shulin Yang ... Qing Zhang Nick Barrow-Williams B. Turnbull Brendan M. Collins Thomas Legrand DML 80 86 0 22 Oct 2018
A Modern Take on the Bias-Variance Tradeoff in Neural Networks Brady Neal Sarthak Mittal A. Baratin Vinayak Tantia Matthew Scicluna Simon Lacoste-Julien Ioannis Mitliagkas 89 168 0 19 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD Jianyu Wang Gauri Joshi FedML 117 232 0 19 Oct 2018
Approximate Fisher Information Matrix to Characterise the Training of Deep Neural Networks Zhibin Liao Tom Drummond Ian Reid G. Carneiro 82 23 0 16 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning Charles H. Martin Michael W. Mahoney AI4CE 141 201 0 02 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information Z. Yao A. Gholami Daiyaan Arfeen Richard Liaw Joseph E. Gonzalez Kurt Keutzer Michael W. Mahoney ODL 101 42 0 02 Oct 2018
Dynamic Sparse Graph for Efficient Deep Learning Liu Liu Lei Deng Xing Hu Maohua Zhu Guoqi Li Yufei Ding Yuan Xie GNN 90 42 0 01 Oct 2018
Directional Analysis of Stochastic Gradient Descent via von Mises-Fisher Distributions in Deep learning Cheolhyoung Lee Kyunghyun Cho Wanmo Kang 68 8 0 29 Sep 2018
Discovering Low-Precision Networks Close to Full-Precision Networks for Efficient Embedded Inference J. McKinstry S. K. Esser R. Appuswamy Deepika Bablani John V. Arthur Izzet B. Yildiz D. Modha MQ 79 94 0 11 Sep 2018
Normalization in Training U-Net for 2D Biomedical Semantic Segmentation Xiao-Yun Zhou Guang-Zhong Yang 105 80 0 11 Sep 2018
Single-Microphone Speech Enhancement and Separation Using Deep Learning Morten Kolbaek 58 7 0 31 Aug 2018
The University of Cambridge's Machine Translation Systems for WMT18 Felix Stahlberg Adria de Gispert Bill Byrne 56 20 0 28 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD Tao R. Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi 133 432 0 22 Aug 2018
Fast, Better Training Trick -- Random Gradient Jiakai Wei ODL 23 2 0 13 Aug 2018
Large Scale Language Modeling: Converging on 40GB of Text in Four Hours Raul Puri Robert M. Kirby Nikolai Yakovenko Bryan Catanzaro 79 29 0 03 Aug 2018
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes Xianyan Jia Shutao Song W. He Yangzihao Wang Haidong Rong ... Li Yu Tiegang Chen Guangxiao Hu Shaoshuai Shi Xiaowen Chu 115 385 0 30 Jul 2018
An argument in favor of strong scaling for deep neural networks with small datasets R. L. F. Cunha Eduardo Rodrigues Matheus Palhares Viana Dario Augusto Borges Oliveira 72 2 0 24 Jul 2018
Trust-Region Algorithms for Training Responses: Machine Learning Methods Using Indefinite Hessian Approximations Jennifer B. Erway J. Griffin Roummel F. Marcia Riadh Omheni 64 24 0 01 Jul 2018
Stochastic natural gradient descent draws posterior samples in function space Samuel L. Smith Daniel Duckworth Semon Rezchikov Quoc V. Le Jascha Narain Sohl-Dickstein BDL 85 6 0 25 Jun 2018
Pushing the boundaries of parallel Deep Learning -- A practical approach Paolo Viviani M. Drocco Marco Aldinucci OOD 52 0 0 25 Jun 2018
Character-Level Feature Extraction with Densely Connected Networks Chanhee Lee Young-Bum Kim Dongyub Lee Heuiseok Lim 3DV 43 12 0 24 Jun 2018
Kernel machines that adapt to GPUs for effective large batch training Siyuan Ma M. Belkin 33 2 0 15 Jun 2018
Perturbative Neural Networks Felix Juefei Xu Vishnu Boddeti Marios Savvides 70 38 0 05 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate Mor Shpigel Nacson Nathan Srebro Daniel Soudry FedML MLT 102 102 0 05 Jun 2018
Analysis of DAWNBench, a Time-to-Accuracy Machine Learning Performance Benchmark Cody Coleman Daniel Kang Deepak Narayanan Luigi Nardi Tian Zhao Jian Zhang Peter Bailis K. Olukotun Christopher Ré Matei A. Zaharia 71 117 0 04 Jun 2018
Implicit Bias of Gradient Descent on Linear Convolutional Networks Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro MDE 159 414 0 01 Jun 2018
Scaling Neural Machine Translation Myle Ott Sergey Edunov David Grangier Michael Auli AIMat 208 617 0 01 Jun 2018
Understanding Batch Normalization Johan Bjorck Carla P. Gomes B. Selman Kilian Q. Weinberger 266 620 0 01 Jun 2018
Gradient Energy Matching for Distributed Asynchronous Gradient Descent Joeri Hermans Gilles Louppe 53 5 0 22 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning W. Wen Yandan Wang Feng Yan Cong Xu Chunpeng Wu Yiran Chen H. Li 79 52 0 21 May 2018
Multi-representation Ensembles and Delayed SGD Updates Improve Syntax-based NMT Danielle Saunders Felix Stahlberg Adria de Gispert Bill Byrne 97 25 0 01 May 2018
SimpleQuestions Nearly Solved: A New Upperbound and Baseline Approach Michael Petrochuk Luke Zettlemoyer 61 90 0 24 Apr 2018
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism Nicolas Weber F. Schmidt Mathias Niepert Felipe Huici 29 9 0 23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks Dominic Masters Carlo Luschi ODL 83 671 0 20 Apr 2018
μ-cuDNN: Accelerating Deep Learning Frameworks with Micro-Batching Yosuke Oyama Tal Ben-Nun Torsten Hoefler Satoshi Matsuoka 31 1 0 13 Apr 2018
Training Tips for the Transformer Model Martin Popel Ondrej Bojar 110 312 0 01 Apr 2018
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay L. Smith 317 1,037 0 26 Mar 2018
Norm matters: efficient and accurate normalization schemes in deep networks Elad Hoffer Ron Banner Itay Golan Daniel Soudry OffRL 92 179 0 05 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 87 713 0 26 Feb 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 102 119 0 24 Feb 2018
Computation of optimal transport and related hedging problems via penalization and neural networks Stephan Eckstein Michael Kupper OT 80 50 0 23 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro AI4CE 126 413 0 22 Feb 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries Z. Yao A. Gholami Qi Lei Kurt Keutzer Michael W. Mahoney 108 168 0 22 Feb 2018
A Progressive Batching L-BFGS Method for Machine Learning Raghu Bollapragada Dheevatsa Mudigere J. Nocedal Hao-Jun Michael Shi P. T. P. Tang ODL 114 153 0 15 Feb 2018
On Characterizing the Capacity of Neural Networks using Algebraic Topology William H. Guss Ruslan Salakhutdinov 91 90 0 13 Feb 2018
On Scale-out Deep Learning Training for Cloud and HPC Srinivas Sridharan K. Vaidyanathan Dhiraj D. Kalamkar Dipankar Das Mikhail E. Smorkalov ... Dheevatsa Mudigere Naveen Mellempudi Sasikanth Avancha Bharat Kaul Pradeep Dubey BDL 75 30 0 24 Jan 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning Siyuan Ma Raef Bassily M. Belkin 117 291 0 18 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks Aditya Devarakonda Maxim Naumov M. Garland ODL 114 136 0 06 Dec 2017
A Resizable Mini-batch Gradient Descent based on a Multi-Armed Bandit S. Cho Sunghun Kang Chang D. Yoo 84 1 0 17 Nov 2017