Towards Theoretical Understanding of Large Batch Training in Stochastic
Gradient Descent

Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent

3 December 2018

Yuhua Zhu

Papers citing "Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent"

5 / 5 papers shown

Title
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes Chao Ma D. Kunin Lei Wu Lexing Ying 25 27 0 24 Apr 2022
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
Reproducing Activation Function for Deep Learning Senwei Liang Liyao Lyu Chunmei Wang Haizhao Yang 36 21 0 13 Jan 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 284 2,890 0 15 Sep 2016
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 179 1,185 0 30 Nov 2014