Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
50 / 514 papers shown
Title
Training Faster by Separating Modes of Variation in Batch-normalized Models
Mahdi M. Kalayeh
M. Shah
27
42
0
07 Jun 2018
Backdrop: Stochastic Backpropagation
Siavash Golkar
Kyle Cranmer
36
2
0
04 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
47
140
0
04 Jun 2018
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
18
593
0
01 Jun 2018
How Does Batch Normalization Help Optimization?
Shibani Santurkar
Dimitris Tsipras
Andrew Ilyas
A. Madry
ODL
27
1,521
0
29 May 2018
Distilling Knowledge for Search-based Structured Prediction
Yijia Liu
Wanxiang Che
Huaipeng Zhao
Bing Qin
Ting Liu
19
22
0
29 May 2018
Investigating Label Noise Sensitivity of Convolutional Neural Networks for Fine Grained Audio Signal Labelling
Rainer Kelz
Gerhard Widmer
NoLa
11
4
0
28 May 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
76
1,043
0
24 May 2018
Stochastic modified equations for the asynchronous stochastic gradient descent
Jing An
Jian-wei Lu
Lexing Ying
21
79
0
21 May 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
24
50
0
21 May 2018
Unifying Data, Model and Hybrid Parallelism in Deep Learning via Tensor Tiling
Minjie Wang
Chien-chin Huang
Jinyang Li
FedML
24
25
0
10 May 2018
On Visual Hallmarks of Robustness to Adversarial Malware
Alex Huang
Abdullah Al-Dujaili
Erik Hemberg
Una-May O’Reilly
AAML
25
7
0
09 May 2018
SHADE: Information Based Regularization for Deep Learning
Michael Blot
Thomas Robert
Nicolas Thome
Matthieu Cord
32
12
0
29 Apr 2018
Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks
M. Mohammadi
Ala I. Al-Fuqaha
Jun-Seok Oh
GAN
11
23
0
23 Apr 2018
Revisiting Small Batch Training for Deep Neural Networks
Dominic Masters
Carlo Luschi
ODL
25
658
0
20 Apr 2018
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Wenda Zhou
Victor Veitch
Morgane Austern
Ryan P. Adams
Peter Orbanz
35
209
0
16 Apr 2018
DeepFM: An End-to-End Wide & Deep Learning Framework for CTR Prediction
Huifeng Guo
Ruiming Tang
Yunming Ye
Zhenguo Li
Xiuqiang He
Zhenhua Dong
115
64
0
12 Apr 2018
The Loss Surface of XOR Artificial Neural Networks
D. Mehta
Xiaojun Zhao
Edgar A. Bernal
D. Wales
34
19
0
06 Apr 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Marco Baity-Jesi
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
M. Wyart
Giulio Biroli
AI4CE
31
113
0
19 Mar 2018
On the importance of single directions for generalization
Ari S. Morcos
David Barrett
Neil C. Rabinowitz
M. Botvinick
13
328
0
19 Mar 2018
On the insufficiency of existing momentum schemes for Stochastic Optimization
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
22
117
0
15 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
37
1,617
0
14 Mar 2018
Essentially No Barriers in Neural Network Energy Landscape
Felix Dräxler
K. Veschgini
M. Salmhofer
Fred Hamprecht
MoMe
20
424
0
02 Mar 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
24
118
0
24 Feb 2018
Characterizing Implicit Bias in Terms of Optimization Geometry
Suriya Gunasekar
Jason D. Lee
Daniel Soudry
Nathan Srebro
AI4CE
35
398
0
22 Feb 2018
The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
Nicholas Carlini
Chang-rui Liu
Ulfar Erlingsson
Jernej Kos
D. Song
54
1,113
0
22 Feb 2018
Stronger generalization bounds for deep nets via a compression approach
Sanjeev Arora
Rong Ge
Behnam Neyshabur
Yi Zhang
MLT
AI4CE
23
630
0
14 Feb 2018
A Simple Proximal Stochastic Gradient Method for Nonsmooth Nonconvex Optimization
Zhize Li
Jian Li
39
116
0
13 Feb 2018
On Scale-out Deep Learning Training for Cloud and HPC
Srinivas Sridharan
K. Vaidyanathan
Dhiraj D. Kalamkar
Dipankar Das
Mikhail E. Smorkalov
...
Dheevatsa Mudigere
Naveen Mellempudi
Sasikanth Avancha
Bharat Kaul
Pradeep Dubey
BDL
21
30
0
24 Jan 2018
Multi-pseudo Regularized Label for Generated Data in Person Re-Identification
Y. Huang
Jingsong Xu
Qiang Wu
Zhedong Zheng
Zhaoxiang Zhang
Jian Zhang
GAN
13
112
0
21 Jan 2018
Theory of Deep Learning IIb: Optimization Properties of SGD
Chiyuan Zhang
Q. Liao
Alexander Rakhlin
Brando Miranda
Noah Golowich
T. Poggio
ODL
25
71
0
07 Jan 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
95
1,844
0
28 Dec 2017
The exploding gradient problem demystified - definition, prevalence, impact, origin, tradeoffs, and solutions
George Philipp
D. Song
J. Carbonell
ODL
32
46
0
15 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
19
136
0
06 Dec 2017
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
41
2,078
0
14 Nov 2017
Scale out for large minibatch SGD: Residual network training on ImageNet-1K with improved accuracy and reduced time to train
V. Codreanu
Damian Podareanu
V. Saletore
36
55
0
12 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory
Ron Amit
Ron Meir
BDL
MLT
32
173
0
03 Nov 2017
Efficient Training of Convolutional Neural Nets on Large Distributed Systems
Sameer Kumar
D. Sreedhar
Vaibhav Saxena
Yogish Sabharwal
Ashish Verma
33
4
0
02 Nov 2017
Deep Learning as a Mixed Convex-Combinatorial Optimization Problem
A. Friesen
Pedro M. Domingos
26
20
0
31 Oct 2017
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
Charles H. Martin
Michael W. Mahoney
AI4CE
27
62
0
26 Oct 2017
AdaDNNs: Adaptive Ensemble of Deep Neural Networks for Scene Text Recognition
Chun Yang
Xu-Cheng Yin
Zejun Li
Jianwei Wu
Chunchao Guo
Hongfa Wang
Lei Xiao
16
10
0
10 Oct 2017
Neural Optimizer Search with Reinforcement Learning
Irwan Bello
Barret Zoph
Vijay Vasudevan
Quoc V. Le
ODL
29
383
0
21 Sep 2017
Adversarial Networks for Spatial Context-Aware Spectral Image Reconstruction from RGB
Aitor Alvarez-Gila
Joost van de Weijer
Estíbaliz Garrote
GAN
27
90
0
01 Sep 2017
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
16
838
0
13 Aug 2017
Scaling Deep Learning on GPU and Knights Landing clusters
Yang You
A. Buluç
J. Demmel
GNN
15
75
0
09 Aug 2017
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
62
1,235
0
27 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
24
25
0
07 Jun 2017
Spectral Norm Regularization for Improving the Generalizability of Deep Learning
Yuichi Yoshida
Takeru Miyato
22
324
0
31 May 2017
Implicit Regularization in Matrix Factorization
Suriya Gunasekar
Blake E. Woodworth
Srinadh Bhojanapalli
Behnam Neyshabur
Nathan Srebro
8
486
0
25 May 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
32
792
0
24 May 2017
Previous
1
2
3
...
10
11
9
Next