Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 514 papers shown
Title
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
34
51
0
24 Jul 2019
On improving deep learning generalization with adaptive sparse connectivity
Shiwei Liu
D. Mocanu
Mykola Pechenizkiy
ODL
12
7
0
27 Jun 2019
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
24
10
0
18 Jun 2019
Learning to Forget for Meta-Learning
Sungyong Baik
Seokil Hong
Kyoung Mu Lee
CLL
KELM
16
87
0
13 Jun 2019
The Implicit Bias of AdaGrad on Separable Data
Qian Qian
Xiaoyuan Qian
26
24
0
09 Jun 2019
The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks
Ryo Karakida
S. Akaho
S. Amari
21
39
0
07 Jun 2019
An Empirical Study on Hyperparameters and their Interdependence for RL Generalization
Xingyou Song
Yilun Du
Jacob Jackson
AI4CE
19
8
0
02 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
24
491
0
31 May 2019
Mixed Precision Training With 8-bit Floating Point
Naveen Mellempudi
Sudarshan Srinivasan
Dipankar Das
Bharat Kaul
MQ
16
68
0
29 May 2019
Gram-Gauss-Newton Method: Learning Overparameterized Neural Networks for Regression Problems
Tianle Cai
Ruiqi Gao
Jikai Hou
Siyu Chen
Dong Wang
Di He
Zhihua Zhang
Liwei Wang
ODL
21
57
0
28 May 2019
Shaping the learning landscape in neural networks around wide flat minima
Carlo Baldassi
Fabrizio Pittorino
R. Zecchina
MLT
13
82
0
20 May 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
Mor Shpigel Nacson
Suriya Gunasekar
J. Lee
Nathan Srebro
Daniel Soudry
22
91
0
17 May 2019
Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation
Linfeng Zhang
Jiebo Song
Anni Gao
Jingwei Chen
Chenglong Bao
Kaisheng Ma
FedML
25
844
0
17 May 2019
Orthogonal Deep Neural Networks
Kui Jia
Shuai Li
Yuxin Wen
Tongliang Liu
Dacheng Tao
34
131
0
15 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
14
9
0
13 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Colin Wei
Tengyu Ma
17
109
0
09 May 2019
SWALP : Stochastic Weight Averaging in Low-Precision Training
Guandao Yang
Tianyi Zhang
Polina Kirichenko
Junwen Bai
A. Wilson
Christopher De Sa
16
94
0
26 Apr 2019
Improved visible to IR image transformation using synthetic data augmentation with cycle-consistent adversarial networks
Kyongsik Yun
Kevin Yu
Joseph Osborne
S. Eldin
Luan Nguyen
Alexander Huyen
Thomas Lu
GAN
19
19
0
25 Apr 2019
HARK Side of Deep Learning -- From Grad Student Descent to Automated Machine Learning
O. Gencoglu
M. Gils
E. Guldogan
Chamin Morikawa
Mehmet Süzen
M. Gruber
J. Leinonen
H. Huttunen
11
36
0
16 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
978
0
01 Apr 2019
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
Mingchen Li
Mahdi Soltanolkotabi
Samet Oymak
NoLa
33
351
0
27 Mar 2019
Improving Strong-Scaling of CNN Training by Exploiting Finer-Grained Parallelism
Nikoli Dryden
N. Maruyama
Tom Benson
Tim Moon
M. Snir
B. Van Essen
20
49
0
15 Mar 2019
Deep Learning Based Motion Planning For Autonomous Vehicle Using Spatiotemporal LSTM Network
Zhengwei Bai
B. Cai
Shangguan Wei
Linguo Chai
6
26
0
05 Mar 2019
Multilingual Neural Machine Translation with Knowledge Distillation
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
20
248
0
27 Feb 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Parameter Efficient Training of Deep Convolutional Neural Networks by Dynamic Sparse Reparameterization
Hesham Mostafa
Xin Wang
29
307
0
15 Feb 2019
Training on the Edge: The why and the how
Navjot Kukreja
Alena Shilova
Olivier Beaumont
Jan Huckelheim
N. Ferrier
P. Hovland
Gerard Gorman
14
33
0
13 Feb 2019
Quasi-Newton Methods for Machine Learning: Forget the Past, Just Sample
A. Berahas
Majid Jahani
Peter Richtárik
Martin Takávc
16
40
0
28 Jan 2019
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
25
72
0
27 Jan 2019
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Charles H. Martin
Michael W. Mahoney
21
119
0
24 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
17
237
0
18 Jan 2019
Ensemble Feature for Person Re-Identification
Jiabao Wang
Yang Li
Zhuang Miao
OOD
3DPC
23
1
0
17 Jan 2019
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva
Alessandro Sordoni
Rémi Tachet des Combes
Adam Trischler
Yoshua Bengio
Geoffrey J. Gordon
46
712
0
12 Dec 2018
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
25
11
0
03 Dec 2018
Dense xUnit Networks
I. Kligvasser
T. Michaeli
21
3
0
27 Nov 2018
Self-Referenced Deep Learning
Xu Lan
Xiatian Zhu
S. Gong
27
23
0
19 Nov 2018
A Closer Look at Deep Policy Gradients
Andrew Ilyas
Logan Engstrom
Shibani Santurkar
Dimitris Tsipras
Firdaus Janoos
Larry Rudolph
Aleksander Madry
22
50
0
06 Nov 2018
Sequenced-Replacement Sampling for Deep Learning
C. Ho
Dae Hoon Park
Wei Yang
Yi Chang
24
0
0
19 Oct 2018
Toward Understanding the Impact of Staleness in Distributed Machine Learning
Wei-Ming Dai
Yi Zhou
Nanqing Dong
H. M. Zhang
Eric P. Xing
17
79
0
08 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
32
190
0
02 Oct 2018
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
Fuxun Yu
Chenchen Liu
Yanzhi Wang
Liang Zhao
Xiang Chen
AAML
OOD
31
27
0
29 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Understanding training and generalization in deep learning by Fourier analysis
Zhi-Qin John Xu
AI4CE
19
92
0
13 Aug 2018
Generalization Error in Deep Learning
Daniel Jakubovitz
Raja Giryes
M. Rodrigues
AI4CE
32
109
0
03 Aug 2018
Efficient Decentralized Deep Learning by Dynamic Model Averaging
Michael Kamp
Linara Adilova
Joachim Sicking
Fabian Hüger
Peter Schlicht
Tim Wirtz
Stefan Wrobel
29
128
0
09 Jul 2018
Optimization of neural networks via finite-value quantum fluctuations
Masayuki Ohzeki
Shuntaro Okada
Masayoshi Terabe
S. Taguchi
19
21
0
01 Jul 2018
PCA of high dimensional random walks with comparison to neural network training
J. Antognini
Jascha Narain Sohl-Dickstein
OOD
19
27
0
22 Jun 2018
On the Spectral Bias of Neural Networks
Nasim Rahaman
A. Baratin
Devansh Arpit
Felix Dräxler
Min-Bin Lin
Fred Hamprecht
Yoshua Bengio
Aaron Courville
45
1,390
0
22 Jun 2018
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
22
43
0
17 Jun 2018
The Effect of Network Width on the Performance of Large-batch Training
Lingjiao Chen
Hongyi Wang
Jinman Zhao
Dimitris Papailiopoulos
Paraschos Koutris
16
22
0
11 Jun 2018
Previous
1
2
3
...
10
11
8
9
Next