An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise

21 February 2019

Jimmy Ba

Papers citing "An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise"

47 / 47 papers shown

Title
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model Guodong Zhang Lala Li Zachary Nado James Martens Sushant Sachdeva George E. Dahl Christopher J. Shallue Roger C. Grosse 93 153 0 09 Jul 2019
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent Jaehoon Lee Lechao Xiao S. Schoenholz Yasaman Bahri Roman Novak Jascha Narain Sohl-Dickstein Jeffrey Pennington 211 1,104 0 18 Feb 2019
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 82 409 0 08 Nov 2018
Three Mechanisms of Weight Decay Regularization Guodong Zhang Chaoqi Wang Bowen Xu Roger C. Grosse 62 258 0 29 Oct 2018
A Coordinate-Free Construction of Scalable Natural Gradient Kevin Luk Roger C. Grosse 40 11 0 30 Aug 2018
A Surprising Linear Relationship Predicts Test Performance in Deep Networks Q. Liao Brando Miranda Andrzej Banburski Jack Hidary T. Poggio 50 32 0 25 Jul 2018
How Does Batch Normalization Help Optimization? Shibani Santurkar Dimitris Tsipras Andrew Ilyas Aleksander Madry ODL 97 1,542 0 29 May 2018
Stability and Convergence Trade-off of Iterative Optimization Algorithms Yuansi Chen Chi Jin Bin Yu 53 93 0 04 Apr 2018
Flipout: Efficient Pseudo-Independent Weight Perturbations on Mini-Batches Yeming Wen Paul Vicol Jimmy Ba Dustin Tran Roger C. Grosse BDL 55 312 0 12 Mar 2018
Understanding Short-Horizon Bias in Stochastic Meta-Optimization Yuhuai Wu Mengye Ren Renjie Liao Roger C. Grosse 99 138 0 06 Mar 2018
The Anisotropic Noise in Stochastic Gradient Descent: Its Behavior of Escaping from Sharp Minima and Regularization Effects Zhanxing Zhu Jingfeng Wu Ting Yu Lei Wu Jin Ma 36 40 0 01 Mar 2018
A Walk with SGD Chen Xing Devansh Arpit Christos Tsirigotis Yoshua Bengio 89 119 0 24 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry Suriya Gunasekar Jason D. Lee Daniel Soudry Nathan Srebro AI4CE 73 409 0 22 Feb 2018
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization Sanjeev Arora Nadav Cohen Elad Hazan 99 485 0 19 Feb 2018
Noisy Natural Gradient as Variational Inference Guodong Zhang Shengyang Sun David Duvenaud Roger C. Grosse ODL 72 212 0 06 Dec 2017
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 76 463 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 99 996 0 01 Nov 2017
Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks Pratik Chaudhari Stefano Soatto MLT 70 304 0 30 Oct 2017
Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms Han Xiao Kashif Rasul Roland Vollgraf 283 8,883 0 25 Aug 2017
Generalization Bounds of SGLD for Non-convex Learning: Two Theoretical Viewpoints Wenlong Mou Liwei Wang Xiyu Zhai Kai Zheng MLT 52 158 0 19 Jul 2017
Exploring Generalization in Deep Learning Behnam Neyshabur Srinadh Bhojanapalli David A. McAllester Nathan Srebro FAtt 150 1,256 0 27 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 126 3,681 0 08 Jun 2017
Spectral Norm Regularization for Improving the Generalizability of Deep Learning Yuichi Yoshida Takeru Miyato 79 334 0 31 May 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks Elad Hoffer Itay Hubara Daniel Soudry ODL 176 799 0 24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 65 1,032 0 23 May 2017
Geometry of Optimization and Implicit Regularization in Deep Learning Behnam Neyshabur Ryota Tomioka Ruslan Salakhutdinov Nathan Srebro AI4CE 65 133 0 08 May 2017
Stochastic Gradient Descent as Approximate Bayesian Inference Stephan Mandt Matthew D. Hoffman David M. Blei BDL 55 597 0 13 Apr 2017
How to Escape Saddle Points Efficiently Chi Jin Rong Ge Praneeth Netrapalli Sham Kakade Michael I. Jordan ODL 224 836 0 02 Mar 2017
Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis Maxim Raginsky Alexander Rakhlin Matus Telgarsky 73 521 0 13 Feb 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 96 773 0 06 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 897 6,790 0 26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 424 2,941 0 15 Sep 2016
Optimization Methods for Large-Scale Machine Learning Léon Bottou Frank E. Curtis J. Nocedal 246 3,216 0 15 Jun 2016
A Kronecker-factored approximate Fisher matrix for convolution layers Roger C. Grosse James Martens ODL 105 264 0 03 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,020 0 10 Dec 2015
Deep Speech 2: End-to-End Speech Recognition in English and Mandarin Dario Amodei Rishita Anubhai Eric Battenberg Carl Case Jared Casper ... Chong-Jun Wang Bo Xiao Dani Yogatama J. Zhan Zhenyao Zhu 126 2,973 0 08 Dec 2015
Adding Gradient Noise Improves Learning for Very Deep Networks Arvind Neelakantan Luke Vilnis Quoc V. Le Ilya Sutskever Lukasz Kaiser Karol Kurach James Martens AI4CE ODL 83 545 0 21 Nov 2015
Stochastic modified equations and adaptive stochastic gradient algorithms Qianxiao Li Cheng Tai E. Weinan 59 284 0 19 Nov 2015
Efficient Per-Example Gradient Computations Ian Goodfellow 257 75 0 07 Oct 2015
Train faster, generalize better: Stability of stochastic gradient descent Moritz Hardt Benjamin Recht Y. Singer 116 1,241 0 03 Sep 2015
Optimizing Neural Networks with Kronecker-factored Approximate Curvature James Martens Roger C. Grosse ODL 101 1,014 0 19 Mar 2015
Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition Rong Ge Furong Huang Chi Jin Yang Yuan 137 1,058 0 06 Mar 2015
New insights and perspectives on the natural gradient method James Martens ODL 73 624 0 03 Dec 2014
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 261 1,198 0 30 Nov 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.6K 100,386 0 04 Sep 2014
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives Aaron Defazio Francis R. Bach Simon Lacoste-Julien ODL 133 1,826 0 01 Jul 2014
No More Pesky Learning Rates Tom Schaul Sixin Zhang Yann LeCun 137 478 0 06 Jun 2012