Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.08741
Cited By
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
24 May 2017
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Train longer, generalize better: closing the generalization gap in large batch training of neural networks"
50 / 156 papers shown
Title
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
205
0
25 Sep 2019
TabNet: Attentive Interpretable Tabular Learning
Sercan Ö. Arik
Tomas Pfister
LMTD
46
1,283
0
20 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
34
55
0
30 Jul 2019
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
23
48
0
12 Jul 2019
On the Noisy Gradient Descent that Generalizes as SGD
Jingfeng Wu
Wenqing Hu
Haoyi Xiong
Jun Huan
Vladimir Braverman
Zhanxing Zhu
MLT
24
10
0
18 Jun 2019
The Implicit Bias of AdaGrad on Separable Data
Qian Qian
Xiaoyuan Qian
37
23
0
09 Jun 2019
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
38
491
0
31 May 2019
Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models
Mor Shpigel Nacson
Suriya Gunasekar
J. Lee
Nathan Srebro
Daniel Soudry
33
92
0
17 May 2019
Scaling Distributed Training of Flood-Filling Networks on HPC Infrastructure for Brain Mapping
Wu Dong
Murat Keçeli
Rafael Vescovi
Hanyu Li
Corey Adams
...
T. Uram
V. Vishwanath
N. Ferrier
B. Kasthuri
P. Littlewood
FedML
AI4CE
19
9
0
13 May 2019
Data-dependent Sample Complexity of Deep Neural Networks via Lipschitz Augmentation
Colin Wei
Tengyu Ma
25
109
0
09 May 2019
Dynamic Mini-batch SGD for Elastic Distributed Training: Learning in the Limbo of Resources
Yanghua Peng
Hang Zhang
Yifei Ma
Tong He
Zhi-Li Zhang
Sheng Zha
Mu Li
25
23
0
26 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
28
980
0
01 Apr 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Random Search and Reproducibility for Neural Architecture Search
Liam Li
Ameet Talwalkar
OOD
33
717
0
20 Feb 2019
Asymmetric Valleys: Beyond Sharp and Flat Local Minima
Haowei He
Gao Huang
Yang Yuan
ODL
MLT
25
147
0
02 Feb 2019
Augment your batch: better training with larger batches
Elad Hoffer
Tal Ben-Nun
Itay Hubara
Niv Giladi
Torsten Hoefler
Daniel Soudry
ODL
30
72
0
27 Jan 2019
Traditional and Heavy-Tailed Self Regularization in Neural Network Models
Charles H. Martin
Michael W. Mahoney
21
119
0
24 Jan 2019
Measurements of Three-Level Hierarchical Structure in the Outliers in the Spectrum of Deepnet Hessians
Vardan Papyan
24
87
0
24 Jan 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
26
237
0
18 Jan 2019
CROSSBOW: Scaling Deep Learning with Small Batch Sizes on Multi-GPU Servers
A. Koliousis
Pijika Watcharapichat
Matthias Weidlich
Luo Mai
Paolo Costa
Peter R. Pietzuch
11
69
0
08 Jan 2019
Scaling description of generalization with number of parameters in deep learning
Mario Geiger
Arthur Jacot
S. Spigler
Franck Gabriel
Levent Sagun
Stéphane dÁscoli
Giulio Biroli
Clément Hongler
M. Wyart
52
195
0
06 Jan 2019
Towards Theoretical Understanding of Large Batch Training in Stochastic Gradient Descent
Xiaowu Dai
Yuhua Zhu
25
11
0
03 Dec 2018
LEARN Codes: Inventing Low-latency Codes via Recurrent Neural Networks
Yihan Jiang
Hyeji Kim
Himanshu Asnani
Sreeram Kannan
Sewoong Oh
Pramod Viswanath
38
79
0
30 Nov 2018
Neural Sign Language Translation based on Human Keypoint Estimation
Sang-Ki Ko
Chang Jo Kim
Hyedong Jung
C. Cho
SLR
30
207
0
28 Nov 2018
Deep Frank-Wolfe For Neural Network Optimization
Leonard Berrada
Andrew Zisserman
M. P. Kumar
ODL
11
40
0
19 Nov 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
38
191
0
02 Oct 2018
Removing the Feature Correlation Effect of Multiplicative Noise
Zijun Zhang
Yining Zhang
Zongpeng Li
13
8
0
19 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Generalization Error in Deep Learning
Daniel Jakubovitz
Raja Giryes
M. Rodrigues
AI4CE
32
109
0
03 Aug 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
19
193
0
18 Jun 2018
The Effect of Network Width on the Performance of Large-batch Training
Lingjiao Chen
Hongyi Wang
Jinman Zhao
Dimitris Papailiopoulos
Paraschos Koutris
21
22
0
11 Jun 2018
Training Faster by Separating Modes of Variation in Batch-normalized Models
Mahdi M. Kalayeh
M. Shah
27
42
0
07 Jun 2018
Stochastic Gradient Descent on Separable Data: Exact Convergence with a Fixed Learning Rate
Mor Shpigel Nacson
Nathan Srebro
Daniel Soudry
FedML
MLT
32
97
0
05 Jun 2018
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
41
2
0
04 Jun 2018
Scaling Neural Machine Translation
Myle Ott
Sergey Edunov
David Grangier
Michael Auli
AIMat
42
610
0
01 Jun 2018
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
21
593
0
01 Jun 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
24
50
0
21 May 2018
HG-means: A scalable hybrid genetic algorithm for minimum sum-of-squares clustering
Daniel Gribel
Thibaut Vidal
13
41
0
25 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
37
659
0
20 Apr 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Marco Baity-Jesi
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
M. Wyart
Giulio Biroli
AI4CE
42
113
0
19 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
702
0
26 Feb 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
27
118
0
24 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
37
399
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
D. Song
59
1,113
0
22 Feb 2018
Fix your classifier: the marginal value of training the last weight layer
Elad Hoffer
Itay Hubara
Daniel Soudry
35
101
0
14 Jan 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
106
1,844
0
28 Dec 2017
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Kensuke Nakamura
Stefano Soatto
Byung-Woo Hong
BDL
ODL
43
6
0
20 Nov 2017
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
14
457
0
13 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
39
55
0
12 Nov 2017
Stochastic Nonconvex Optimization with Large Minibatches
Weiran Wang
Nathan Srebro
36
26
0
25 Sep 2017
Previous
1
2
3
4
Next