Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1609.04836
Cited By
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
15 September 2016
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima"
14 / 514 papers shown
Title
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia C. Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
20
1,012
0
23 May 2017
Practical Processing of Mobile Sensor Data for Continual Deep Learning Predictions
Kleomenis Katevas
Ilias Leontiadis
M. Pielot
Joan Serrà
HAI
11
12
0
17 May 2017
Nonlinear Information Bottleneck
Artemy Kolchinsky
Brendan D. Tracey
David Wolpert
12
152
0
06 May 2017
Snapshot Ensembles: Train 1, get M for free
Gao Huang
Yixuan Li
Geoff Pleiss
Zhuang Liu
J. Hopcroft
Kilian Q. Weinberger
OOD
FedML
UQCV
45
935
0
01 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data
Gintare Karolina Dziugaite
Daniel M. Roy
50
799
0
31 Mar 2017
Sharp Minima Can Generalize For Deep Nets
Laurent Dinh
Razvan Pascanu
Samy Bengio
Yoshua Bengio
ODL
46
755
0
15 Mar 2017
Langevin Dynamics with Continuous Tempering for Training Deep Neural Networks
Nanyang Ye
Zhanxing Zhu
Rafał K. Mantiuk
16
20
0
13 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent
Ilja Kuzborskij
Christoph H. Lampert
MLT
9
165
0
05 Mar 2017
Incorporating Global Visual Features into Attention-Based Neural Machine Translation
Iacer Calixto
Qun Liu
Nick Campbell
24
154
0
23 Jan 2017
Incremental Sequence Learning
E. Jong
CLL
24
5
0
09 Nov 2016
Big Batch SGD: Automated Inference using Adaptive Batch Sizes
Soham De
A. Yadav
David Jacobs
Tom Goldstein
ODL
14
62
0
18 Oct 2016
Distributed Training of Deep Neural Networks: Theoretical and Practical Limits of Parallel Scalability
J. Keuper
Franz-Josef Pfreundt
GNN
55
97
0
22 Sep 2016
Parallelizing Word2Vec in Shared and Distributed Memory
Shihao Ji
N. Satish
Sheng Li
Pradeep Dubey
VLM
MoE
14
72
0
15 Apr 2016
The Loss Surfaces of Multilayer Networks
A. Choromańska
Mikael Henaff
Michaël Mathieu
Gerard Ben Arous
Yann LeCun
ODL
179
1,185
0
30 Nov 2014
Previous
1
2
3
...
10
11
9