ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.07365
  4. Cited By
Parallel SGD: When does averaging help?

Parallel SGD: When does averaging help?

23 June 2016
Jian Zhang
Christopher De Sa
Ioannis Mitliagkas
Christopher Ré
    MoMe
    FedML
ArXivPDFHTML

Papers citing "Parallel SGD: When does averaging help?"

18 / 68 papers shown
Title
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD
Rosa Candela
Giulio Franzese
Maurizio Filippone
Pietro Michiardi
18
1
0
21 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet
  Training
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
29
35
0
04 Oct 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed
  Gradients and Compressed Communication
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
Sebastian U. Stich
Sai Praneeth Karimireddy
FedML
19
20
0
11 Sep 2019
Decentralized Deep Learning with Arbitrary Communication Compression
Decentralized Deep Learning with Arbitrary Communication Compression
Anastasia Koloskova
Tao R. Lin
Sebastian U. Stich
Martin Jaggi
FedML
22
233
0
22 Jul 2019
Faster Neural Network Training with Data Echoing
Faster Neural Network Training with Data Echoing
Dami Choi
Alexandre Passos
Christopher J. Shallue
George E. Dahl
15
48
0
12 Jul 2019
Distributed Optimization for Over-Parameterized Learning
Distributed Optimization for Over-Parameterized Learning
Chi Zhang
Qianxiao Li
16
4
0
14 Jun 2019
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification,
  and Local Computations
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Debraj Basu
Deepesh Data
C. Karakuş
Suhas Diggavi
MQ
13
400
0
06 Jun 2019
Communication trade-offs for synchronized distributed SGD with large
  step size
Communication trade-offs for synchronized distributed SGD with large step size
Kumar Kshitij Patel
Aymeric Dieuleveut
FedML
22
27
0
25 Apr 2019
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
A Distributed Hierarchical SGD Algorithm with Sparse Global Reduction
Fan Zhou
Guojing Cong
11
8
0
12 Mar 2019
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
27
231
0
19 Oct 2018
Cooperative SGD: A unified Framework for the Design and Analysis of
  Communication-Efficient SGD Algorithms
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms
Jianyu Wang
Gauri Joshi
18
348
0
22 Aug 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Parallel Restarted SGD with Faster Convergence and Less Communication:
  Demystifying Why Model Averaging Works for Deep Learning
Parallel Restarted SGD with Faster Convergence and Less Communication: Demystifying Why Model Averaging Works for Deep Learning
Hao Yu
Sen Yang
Shenghuo Zhu
MoMe
FedML
27
597
0
17 Jul 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
73
1,043
0
24 May 2018
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous
  Systems
Mage: Online Interference-Aware Scheduling in Multi-Scale Heterogeneous Systems
Francisco Romero
Christina Delimitrou
22
2
0
17 Apr 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in
  Distributed SGD
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
19
193
0
03 Mar 2018
On the convergence properties of a $K$-step averaging stochastic
  gradient descent algorithm for nonconvex optimization
On the convergence properties of a KKK-step averaging stochastic gradient descent algorithm for nonconvex optimization
Fan Zhou
Guojing Cong
37
232
0
03 Aug 2017
Optimal Distributed Online Prediction using Mini-Batches
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
177
683
0
07 Dec 2010
Previous
12