Parallel SGD: When does averaging help?

Parallel SGD: When does averaging help?

23 June 2016

Christopher De Sa

Ioannis Mitliagkas

Christopher Ré

Papers citing "Parallel SGD: When does averaging help?"

18 / 68 papers shown

Title
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD Rosa Candela Giulio Franzese Maurizio Filippone Pietro Michiardi 18 1 0 21 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training John Shelton Hyatt Cameron R. Wolfe Michael Lee Yuxin Tang Anastasios Kyrillidis Christopher M. Jermaine OOD 29 35 0 04 Oct 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication Sebastian U. Stich Sai Praneeth Karimireddy FedML 19 20 0 11 Sep 2019
Decentralized Deep Learning with Arbitrary Communication Compression Anastasia Koloskova Tao R. Lin Sebastian U. Stich Martin Jaggi FedML 22 233 0 22 Jul 2019
Faster Neural Network Training with Data Echoing Dami Choi Alexandre Passos Christopher J. Shallue George E. Dahl 15 48 0 12 Jul 2019
Distributed Optimization for Over-Parameterized Learning Chi Zhang Qianxiao Li 16 4 0 14 Jun 2019
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations Debraj Basu Deepesh Data C. Karakuş Suhas Diggavi MQ 13 400 0 06 Jun 2019
Communication trade-offs for synchronized distributed SGD with large step size Kumar Kshitij Patel Aymeric Dieuleveut FedML 22 27 0 25 Apr 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction Fan Zhou Guojing Cong 11 8 0 12 Mar 2019
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD Jianyu Wang Gauri Joshi FedML 27 231 0 19 Oct 2018
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms Jianyu Wang Gauri Joshi 18 348 0 22 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD Tao R. Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi 57 429 0 22 Aug 2018
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning Hao Yu Sen Yang Shenghuo Zhu MoMe FedML 27 597 0 17 Jul 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 73 1,043 0 24 May 2018
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems Francisco Romero Christina Delimitrou 22 2 0 17 Apr 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD Sanghamitra Dutta Gauri Joshi Soumyadip Ghosh Parijat Dube P. Nagpurkar 19 193 0 03 Mar 2018
On the convergence properties of a $K$ -step averaging stochastic gradient descent algorithm for nonconvex optimization Fan Zhou Guojing Cong 37 232 0 03 Aug 2017
Optimal Distributed Online Prediction using Mini-Batches O. Dekel Ran Gilad-Bachrach Ohad Shamir Lin Xiao 177 683 0 07 Dec 2010