Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.07365
Cited By
Parallel SGD: When does averaging help?
23 June 2016
Jian Zhang
Christopher De Sa
Ioannis Mitliagkas
Christopher Ré
MoMe
FedML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Parallel SGD: When does averaging help?"
18 / 68 papers shown
Title
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD
Rosa Candela
Giulio Franzese
Maurizio Filippone
Pietro Michiardi
18
1
0
21 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
29
35
0
04 Oct 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
Sebastian U. Stich
Sai Praneeth Karimireddy
FedML
19
20
0
11 Sep 2019
Decentralized Deep Learning with Arbitrary Communication Compression
Anastasia Koloskova
Tao R. Lin
Sebastian U. Stich
Martin Jaggi
FedML
22
233
0
22 Jul 2019
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
15
48
0
12 Jul 2019
Distributed Optimization for Over-Parameterized Learning
Chi Zhang
Qianxiao Li
16
4
0
14 Jun 2019
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Debraj Basu
Deepesh Data
C. Karakuş
Suhas Diggavi
MQ
13
400
0
06 Jun 2019
Communication trade-offs for synchronized distributed SGD with large step size
Kumar Kshitij Patel
Aymeric Dieuleveut
FedML
22
27
0
25 Apr 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
Fan Zhou
Guojing Cong
11
8
0
12 Mar 2019
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
27
231
0
19 Oct 2018
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang
Gauri Joshi
18
348
0
22 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
Hao Yu
Sen Yang
Shenghuo Zhu
MoMe
FedML
27
597
0
17 Jul 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
73
1,043
0
24 May 2018
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems
Francisco Romero
Christina Delimitrou
22
2
0
17 Apr 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
19
193
0
03 Mar 2018
On the convergence properties of a
K
K
K
-step averaging stochastic gradient descent algorithm for nonconvex optimization
Fan Zhou
Guojing Cong
37
232
0
03 Aug 2017
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
177
683
0
07 Dec 2010
Previous
1
2