v1v2 (latest)

Don't Decay the Learning Rate, Increase the Batch Size

1 November 2017

Samuel L. Smith

Pieter-Jan Kindermans

Papers citing "Don't Decay the Learning Rate, Increase the Batch Size"

50 / 454 papers shown

Title
How Data Augmentation affects Optimization for Linear Regression Boris Hanin Yi Sun 86 16 0 21 Oct 2020
Just Pick a Sign: Optimizing Deep Multitask Models with Gradient Sign Dropout Zhao Chen Jiquan Ngiam Yanping Huang Thang Luong Henrik Kretzschmar Yuning Chai Dragomir Anguelov 90 221 0 14 Oct 2020
WHO 2016 subtyping and automated segmentation of glioma using multi-task deep learning S. V. D. Voort Fatih Incekara M. Wijnenga G. Kapsas R. Gahrmann ... A. Vincent W. Niessen M. Bent M. Smits S. Klein 37 7 0 09 Oct 2020
Genetic-algorithm-optimized neural networks for gravitational wave classification Dwyer Deighan Scott E. Field C. Capano G. Khanna 57 22 0 09 Oct 2020
COVID-19 Classification Using Staked Ensembles: A Comprehensive Analysis B. LalithBharadwaj Rohit Boddeda K. Vardhan G. Madhu 28 1 0 07 Oct 2020
Reconciling Modern Deep Learning with Traditional Optimization Analyses: The Intrinsic Learning Rate Zhiyuan Li Kaifeng Lyu Sanjeev Arora 112 75 0 06 Oct 2020
Improved generalization by noise enhancement Takashi Mori Masahito Ueda 50 3 0 28 Sep 2020
Improved Modeling of 3D Shapes with Multi-view Depth Maps Kamal Gupta Susmija Jabbireddy Ketul Shah Abhinav Shrivastava Matthias Zwicker 3DV 43 5 0 07 Sep 2020
S-SGD: Symmetrical Stochastic Gradient Descent with Weight Noise Injection for Reaching Flat Minima Wonyong Sung Iksoo Choi Jinhwan Park Seokhyun Choi Sungho Shin ODL 58 7 0 05 Sep 2020
Pollux: Co-adaptive Cluster Scheduling for Goodput-Optimized Deep Learning Aurick Qiao Sang Keun Choe Suhas Jayaram Subramanya Willie Neiswanger Qirong Ho Hao Zhang G. Ganger Eric Xing VLM 79 183 0 27 Aug 2020
Relevance of Rotationally Equivariant Convolutions for Predicting Molecular Properties Benjamin Kurt Miller Mario Geiger Tess E. Smidt Frank Noé 114 78 0 19 Aug 2020
A Survey on Large-scale Machine Learning Meng Wang Weijie Fu Xiangnan He Shijie Hao Xindong Wu 84 112 0 10 Aug 2020
Linear discriminant initialization for feed-forward neural networks Marissa Masden D. Sinha FedML 40 3 0 24 Jul 2020
On stochastic mirror descent with interacting particles: convergence properties and variance reduction Anastasia Borovykh N. Kantas P. Parpas G. Pavliotis 55 12 0 15 Jul 2020
Analyzing and Mitigating Data Stalls in DNN Training Jayashree Mohan Amar Phanishayee Ashish Raniwala Vijay Chidambaram 86 110 0 14 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning Peng Jiang G. Agrawal 54 5 0 13 Jul 2020
AdaScale SGD: A User-Friendly Algorithm for Distributed Training Tyler B. Johnson Pulkit Agrawal Haijie Gu Carlos Guestrin ODL 90 37 0 09 Jul 2020
Coded Distributed Computing with Partial Recovery Emre Ozfatura S. Ulukus Deniz Gunduz 71 29 0 04 Jul 2020
Variance reduction for Riemannian non-convex optimization with batch size adaptation Andi Han Junbin Gao 85 5 0 03 Jul 2020
Gradient-only line searches to automatically determine learning rates for a variety of stochastic training algorithms D. Kafka D. Wilke ODL 43 0 0 29 Jun 2020
Is SGD a Bayesian sampler? Well, almost Chris Mingard Guillermo Valle Pérez Joar Skalse A. Louis BDL 83 53 0 26 Jun 2020
On the Generalization Benefit of Noise in Stochastic Gradient Descent Samuel L. Smith Erich Elsen Soham De MLT 62 100 0 26 Jun 2020
Effective Elastic Scaling of Deep Learning Workloads Vaibhav Saxena K.R. Jayaram Saurav Basu Yogish Sabharwal Ashish Verma 57 9 0 24 Jun 2020
Hippo: Taming Hyper-parameter Optimization of Deep Learning with Stage Trees Ahnjae Shin Do Yoon Kim Joo Seong Jeong Byung-Gon Chun 52 4 0 22 Jun 2020
How do SGD hyperparameters in natural training affect adversarial robustness? Sandesh Kamath Amit Deshpande K. Subrahmanyam AAML 44 3 0 20 Jun 2020
An Online Method for A Class of Distributionally Robust Optimization with Non-Convex Objectives Qi Qi Zhishuai Guo Yi Tian Xu Rong Jin Tianbao Yang 117 47 0 17 Jun 2020
Fine-Grained Stochastic Architecture Search S. Chaudhuri Elad Eban Hanhan Li Max Moroz Yair Movshovitz-Attias 40 8 0 17 Jun 2020
Gradient Amplification: An efficient way to train deep neural networks S. Basodi Chunyan Ji Haiping Zhang Yi Pan ODL 55 116 0 16 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training Diego Granziol S. Zohren Stephen J. Roberts ODL 148 50 0 16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance Jeff Z. HaoChen Colin Wei Jason D. Lee Tengyu Ma 219 95 0 15 Jun 2020
The Limit of the Batch Size Yang You Yuhui Wang Huan Zhang Zhao-jie Zhang J. Demmel Cho-Jui Hsieh 121 15 0 15 Jun 2020
Understanding the Role of Training Regimes in Continual Learning Seyed Iman Mirzadeh Mehrdad Farajtabar Razvan Pascanu H. Ghasemzadeh CLL 81 228 0 12 Jun 2020
Supervised Learning of Sparsity-Promoting Regularizers for Denoising Michael T. McCann S. Ravishankar 47 8 0 09 Jun 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems Preetum Nakkiran MLT 64 21 0 15 May 2020
OD-SGD: One-step Delay Stochastic Gradient Descent for Distributed Training Yemao Xu Dezun Dong Weixia Xu Xiangke Liao 47 7 0 14 May 2020
Dynamically Adjusting Transformer Batch Size by Monitoring Gradient Direction Change Hongfei Xu Josef van Genabith Deyi Xiong Qiuhui Liu 47 11 0 05 May 2020
Adaptive Learning of the Optimal Batch Size of SGD Motasem Alfarra Slavomir Hanzely Alyazeed Albasyoni Guohao Li Peter Richtárik 48 5 0 03 May 2020
Dynamic backup workers for parallel machine learning Chuan Xu Giovanni Neglia Nicola Sebastianelli 72 11 0 30 Apr 2020
DIET: Lightweight Language Understanding for Dialogue Systems Tanja Bunk Daksh Varshneya Vladimir Vlasov Alan Nichol 74 162 0 21 Apr 2020
On Learning Rates and Schrödinger Operators Bin Shi Weijie J. Su Michael I. Jordan 97 61 0 15 Apr 2020
Stochastic batch size for adaptive regularization in deep network optimization Kensuke Nakamura Stefano Soatto Byung-Woo Hong ODL 51 6 0 14 Apr 2020
Understanding Learning Dynamics for Neural Machine Translation Conghui Zhu Guanlin Li Lemao Liu Tiejun Zhao Shuming Shi 50 3 0 05 Apr 2020
Predicting the outputs of finite deep neural networks trained with noisy gradients Gadi Naveh Oded Ben-David H. Sompolinsky Zohar Ringel 116 23 0 02 Apr 2020
Understanding the Effects of Data Parallelism and Sparsity on Neural Network Training Namhoon Lee Thalaiyasingam Ajanthan Philip Torr Martin Jaggi 52 2 0 25 Mar 2020
The Implicit Regularization of Stochastic Gradient Flow for Least Squares Alnur Ali Yan Sun Robert Tibshirani 103 77 0 17 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey Shuo Ouyang Dezun Dong Yemao Xu Liquan Xiao 128 12 0 06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 237 241 0 04 Mar 2020
Stagewise Enlargement of Batch Size for SGD-based Learning Shen-Yi Zhao Yin-Peng Xie Wu-Jun Li 50 1 0 26 Feb 2020
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers Serge Kas Hanna Rawad Bitar Parimal Parag Venkateswara Dasari S. E. Rouayheb 69 16 0 25 Feb 2020
Baryon acoustic oscillations reconstruction using convolutional neural networks Tianxiang Mao Jie-Shuang Wang Baojiu Li Yan-Chuan Cai B. Falck M. Neyrinck A. Szalay 61 13 0 24 Feb 2020