Measuring the Effects of Data Parallelism on Neural Network Training

8 November 2018

Papers citing "Measuring the Effects of Data Parallelism on Neural Network Training"

7 / 107 papers shown

Title
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning Charles H. Martin Michael W. Mahoney AI4CE 38 191 0 02 Oct 2018
Don't Use Large Mini-Batches, Use Local SGD Tao R. Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi 57 429 0 22 Aug 2018
Large scale distributed neural network training through online distillation Rohan Anil Gabriel Pereyra Alexandre Passos Róbert Ormándi George E. Dahl Geoffrey E. Hinton FedML 278 404 0 09 Apr 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 33 702 0 26 Feb 2018
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior Charles H. Martin Michael W. Mahoney AI4CE 30 62 0 26 Oct 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,890 0 15 Sep 2016
The Effects of Hyperparameters on SGD Training of Neural Networks Thomas Breuel 72 63 0 12 Aug 2015