Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent

3 December 2018

Yuhua Zhu

Papers citing "Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent"

4 / 4 papers shown

Title
Beyond the Quadratic Approximation: the Multiscale Structure of Neural Network Loss Landscapes Chao Ma D. Kunin Lei Wu Lexing Ying 25 27 0 24 Apr 2022
Reproducing Activation Function for Deep Learning Senwei Liang Liyao Lyu Chunmei Wang Haizhao Yang 33 21 0 13 Jan 2021
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 281 2,889 0 15 Sep 2016
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 179 1,185 0 30 Nov 2014