Train longer, generalize better: closing the generalization gap in large
batch training of neural networks

Train longer, generalize better: closing the generalization gap in large batch training of neural networks

24 May 2017

Papers citing "Train longer, generalize better: closing the generalization gap in large batch training of neural networks"

11 / 161 papers shown

Title
Visualizing the Loss Landscape of Neural Nets Hao Li Zheng Xu Gavin Taylor Christoph Studer Tom Goldstein 106 1,848 0 28 Dec 2017
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks Kensuke Nakamura Stefano Soatto Byung-Woo Hong BDL ODL 43 6 0 20 Nov 2017
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 24 457 0 13 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train V. Codreanu Damian Podareanu V. Saletore 39 55 0 12 Nov 2017
Stochastic Nonconvex Optimization with Large Minibatches Weiran Wang Nathan Srebro 36 26 0 25 Sep 2017
Comparison of Batch Normalization and Weight Normalization Algorithms for the Large-scale Image Classification Igor Gitman Boris Ginsburg 8 65 0 24 Sep 2017
Large Batch Training of Convolutional Networks Yang You Igor Gitman Boris Ginsburg ODL 21 840 0 13 Aug 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Z. Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 716 6,748 0 26 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,892 0 15 Sep 2016
Effective Approaches to Attention-based Neural Machine Translation Thang Luong Hieu H. Pham Christopher D. Manning 218 7,926 0 17 Aug 2015
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 183 1,185 0 30 Nov 2014