Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1811.03600
Cited By
Measuring the Effects of Data Parallelism on Neural Network Training
8 November 2018
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Measuring the Effects of Data Parallelism on Neural Network Training"
7 / 107 papers shown
Title
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
38
191
0
02 Oct 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
278
404
0
09 Apr 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
702
0
26 Feb 2018
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior
Charles H. Martin
Michael W. Mahoney
AI4CE
30
62
0
26 Oct 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
The Effects of Hyperparameters on SGD Training of Neural Networks
Thomas Breuel
72
63
0
12 Aug 2015
Previous
1
2
3