Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 514 papers shown
Title
Stochastic Normalized Gradient Descent with Momentum for Large-Batch Training
Shen-Yi Zhao
Chang-Wei Shi
Yin-Peng Xie
Wu-Jun Li
ODL
18
8
0
28 Jul 2020
The Case for Strong Scaling in Deep Learning: Training Large 3D CNNs with Hybrid Parallelism
Yosuke Oyama
N. Maruyama
Nikoli Dryden
Erin McCarthy
P. Harrington
J. Balewski
Satoshi Matsuoka
Peter Nugent
B. Van Essen
3DV
AI4CE
32
37
0
25 Jul 2020
Linear discriminant initialization for feed-forward neural networks
Marissa Masden
D. Sinha
FedML
29
3
0
24 Jul 2020
Explicit Regularisation in Gaussian Noise Injections
A. Camuto
M. Willetts
Umut Simsekli
Stephen J. Roberts
Chris Holmes
23
55
0
14 Jul 2020
Beyond Graph Neural Networks with Lifted Relational Neural Networks
Gustav Sourek
F. Železný
Ondrej Kuzelka
NAI
41
17
0
13 Jul 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
29
12
0
07 Jul 2020
When Does Preconditioning Help or Hurt Generalization?
S. Amari
Jimmy Ba
Roger C. Grosse
Xuechen Li
Atsushi Nitanda
Taiji Suzuki
Denny Wu
Ji Xu
36
32
0
18 Jun 2020
What Do Neural Networks Learn When Trained With Random Labels?
Hartmut Maennel
Ibrahim M. Alabdulmohsin
Ilya O. Tolstikhin
R. Baldock
Olivier Bousquet
Sylvain Gelly
Daniel Keysers
FedML
43
87
0
18 Jun 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
48
0
16 Jun 2020
Shape Matters: Understanding the Implicit Bias of the Noise Covariance
Jeff Z. HaoChen
Colin Wei
J. Lee
Tengyu Ma
29
93
0
15 Jun 2020
On the Loss Landscape of Adversarial Training: Identifying Challenges and How to Overcome Them
Chen Liu
Mathieu Salzmann
Tao R. Lin
Ryota Tomioka
Sabine Süsstrunk
AAML
21
81
0
15 Jun 2020
Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption
Xu Sun
Zhiyuan Zhang
Xuancheng Ren
Ruixuan Luo
Liangyou Li
22
39
0
10 Jun 2020
Speedy Performance Estimation for Neural Architecture Search
Binxin Ru
Clare Lyle
Lisa Schut
M. Fil
Mark van der Wilk
Y. Gal
18
36
0
08 Jun 2020
Automated Copper Alloy Grain Size Evaluation Using a Deep-learning CNN
George S. Baggs
P. Guerrier
A. Loeb
Jason C. Jones
21
9
0
20 May 2020
Learning Rate Annealing Can Provably Help Generalization, Even for Convex Problems
Preetum Nakkiran
MLT
25
21
0
15 May 2020
Pruning artificial neural networks: a way to find well-generalizing, high-entropy sharp minima
Enzo Tartaglione
Andrea Bragagnolo
Marco Grangetto
26
11
0
30 Apr 2020
FlexSA: Flexible Systolic Array Architecture for Efficient Pruned DNN Model Training
Sangkug Lym
M. Erez
13
25
0
27 Apr 2020
Generative Data Augmentation for Commonsense Reasoning
Yiben Yang
Chaitanya Malaviya
Jared Fernandez
Swabha Swayamdipta
Ronan Le Bras
Ji-ping Wang
Chandra Bhagavatula
Yejin Choi
Doug Downey
LRM
22
91
0
24 Apr 2020
AL2: Progressive Activation Loss for Learning General Representations in Classification Neural Networks
Majed El Helou
Frederike Dumbgen
Sabine Süsstrunk
CLL
AI4CE
30
2
0
07 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Coherent Gradients: An Approach to Understanding Generalization in Gradient Descent-based Optimization
S. Chatterjee
ODL
OOD
11
48
0
25 Feb 2020
The Two Regimes of Deep Network Training
Guillaume Leclerc
A. Madry
13
45
0
24 Feb 2020
Communication-Efficient Edge AI: Algorithms and Systems
Yuanming Shi
Kai Yang
Tao Jiang
Jun Zhang
Khaled B. Letaief
GNN
17
326
0
22 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
50
154
0
21 Feb 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
A. Wilson
Pavel Izmailov
UQCV
BDL
OOD
24
639
0
20 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
20
17
0
10 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well
Vipul Gupta
S. Serrano
D. DeCoste
MoMe
38
55
0
07 Jan 2020
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space
Moshir Harsh
J. Tubiana
Simona Cocco
R. Monasson
9
14
0
30 Dec 2019
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
19
168
0
19 Dec 2019
InfoCNF: An Efficient Conditional Continuous Normalizing Flow with Adaptive Solvers
T. Nguyen
Animesh Garg
Richard G. Baraniuk
Anima Anandkumar
TPM
28
9
0
09 Dec 2019
The Group Loss for Deep Metric Learning
Ismail Elezi
Sebastiano Vascon
Alessandro Torcinovich
Marcello Pelillo
Laura Leal-Taixe
14
50
0
01 Dec 2019
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia
Hao Su
27
19
0
19 Nov 2019
Small-GAN: Speeding Up GAN Training Using Core-sets
Samarth Sinha
Hang Zhang
Anirudh Goyal
Yoshua Bengio
Hugo Larochelle
Augustus Odena
GAN
35
72
0
29 Oct 2019
KonIQ-10k: An ecologically valid database for deep learning of blind image quality assessment
Vlad Hosu
Hanhe Lin
T. Szirányi
Dietmar Saupe
14
552
0
14 Oct 2019
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin
Colin Wei
Tengyu Ma
AAML
OOD
36
85
0
09 Oct 2019
Parallelizing Training of Deep Generative Models on Massive Scientific Datasets
S. A. Jacobs
B. Van Essen
D. Hysom
Jae-Seung Yeom
Tim Moon
...
J. Gaffney
Tom Benson
Peter B. Robinson
L. Peterson
B. Spears
BDL
AI4CE
22
17
0
05 Oct 2019
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks
Avraam Chatzimichailidis
Franz-Josef Pfreundt
N. Gauger
J. Keuper
19
10
0
26 Sep 2019
Towards Understanding the Transferability of Deep Representations
Hong Liu
Mingsheng Long
Jianmin Wang
Michael I. Jordan
30
25
0
26 Sep 2019
A Closer Look at Domain Shift for Deep Learning in Histopathology
Karin Stacke
Gabriel Eilertsen
Jonas Unger
Claes Lundström
OOD
10
63
0
25 Sep 2019
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic Training
Yuqi Cui
Yifan Xu
Dongrui Wu
11
62
0
25 Sep 2019
Scale MLPerf-0.6 models on Google TPU-v3 Pods
Sameer Kumar
Victor Bitorff
Dehao Chen
Chi-Heng Chou
Blake A. Hechtman
...
Peter Mattson
Shibo Wang
Tao Wang
Yuanzhong Xu
Zongwei Zhou
8
39
0
21 Sep 2019
Understanding and Robustifying Differentiable Architecture Search
Arber Zela
T. Elsken
Tonmoy Saikia
Yassine Marrakchi
Thomas Brox
Frank Hutter
OOD
AAML
66
366
0
20 Sep 2019
Regularizing CNN Transfer Learning with Randomised Regression
Yang Zhong
A. Maki
16
13
0
16 Aug 2019
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
22
181
0
15 Aug 2019
Instance Enhancement Batch Normalization: an Adaptive Regulator of Batch Noise
Senwei Liang
Zhongzhan Huang
Mingfu Liang
Haizhao Yang
27
57
0
12 Aug 2019
Progressive Transfer Learning
Zhengxu Yu
Long Wei
Zhongming Jin
Jianqiang Huang
Deng Cai
Xiansheng Hua
VLM
24
10
0
07 Aug 2019
How Does Learning Rate Decay Help Modern Neural Networks?
Kaichao You
Mingsheng Long
Jianmin Wang
Michael I. Jordan
24
4
0
05 Aug 2019
On the Existence of Simpler Machine Learning Models
Lesia Semenova
Cynthia Rudin
Ronald E. Parr
26
85
0
05 Aug 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training
Saptadeep Pal
Eiman Ebrahimi
A. Zulfiqar
Yaosheng Fu
Victor Zhang
Szymon Migacz
D. Nellans
Puneet Gupta
34
55
0
30 Jul 2019
Previous
1
2
3
...
10
11
7
8
9
Next