ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.01113
  4. Cited By
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in
  Distributed SGD

Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD

3 March 2018
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
ArXivPDFHTML

Papers citing "Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD"

50 / 101 papers shown
Title
SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural
  Networks
SpreadGNN: Serverless Multi-task Federated Learning for Graph Neural Networks
Chaoyang He
Emir Ceyani
Keshav Balasubramanian
M. Annavaram
Salman Avestimehr
FedML
25
50
0
04 Jun 2021
Coded Gradient Aggregation: A Tradeoff Between Communication Costs at
  Edge Nodes and at Helper Nodes
Coded Gradient Aggregation: A Tradeoff Between Communication Costs at Edge Nodes and at Helper Nodes
B. Sasidharan
Anoop Thomas
28
9
0
06 May 2021
Secure and Efficient Federated Learning Through Layering and Sharding
  Blockchain
Secure and Efficient Federated Learning Through Layering and Sharding Blockchain
Shuo Yuan
Bin Cao
Yaohua Sun
Zhiguo Wan
M. Peng
22
20
0
27 Apr 2021
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep
  Learning
Sync-Switch: Hybrid Parameter Synchronization for Distributed Deep Learning
Shijian Li
Oren Mangoubi
Lijie Xu
Tian Guo
27
15
0
16 Apr 2021
The Gradient Convergence Bound of Federated Multi-Agent Reinforcement
  Learning with Efficient Communication
The Gradient Convergence Bound of Federated Multi-Agent Reinforcement Learning with Efficient Communication
Xing Xu
Rongpeng Li
Zhifeng Zhao
Honggang Zhang
38
12
0
24 Mar 2021
Gradient Coding with Dynamic Clustering for Straggler-Tolerant
  Distributed Learning
Gradient Coding with Dynamic Clustering for Straggler-Tolerant Distributed Learning
Baturalp Buyukates
Emre Ozfatura
S. Ulukus
Deniz Gunduz
35
15
0
01 Mar 2021
On Gradient Coding with Partial Recovery
On Gradient Coding with Partial Recovery
Sahasrajit Sarmasarkar
V. Lalitha
Nikhil Karamchandani
18
8
0
19 Feb 2021
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and
  Stable Convergence
Consistent Lock-free Parallel Stochastic Gradient Descent for Fast and Stable Convergence
Karl Bäckström
Ivan Walulya
Marina Papatriantafilou
P. Tsigas
29
5
0
17 Feb 2021
Linear Convergence in Federated Learning: Tackling Client Heterogeneity
  and Sparse Gradients
Linear Convergence in Federated Learning: Tackling Client Heterogeneity and Sparse Gradients
A. Mitra
Rayana H. Jaafar
George J. Pappas
Hamed Hassani
FedML
55
157
0
14 Feb 2021
Anytime Minibatch with Delayed Gradients
Anytime Minibatch with Delayed Gradients
H. Al-Lawati
S. Draper
17
0
0
15 Dec 2020
SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse
  Tucker Decomposition
SGD_Tucker: A Novel Stochastic Optimization Strategy for Parallel Sparse Tucker Decomposition
Hao Li
Zixuan Li
KenLi Li
Jan S. Rellermeyer
L. Chen
Keqin Li
24
7
0
07 Dec 2020
Accordion: Adaptive Gradient Communication via Critical Learning Regime
  Identification
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
34
25
0
29 Oct 2020
Diversity/Parallelism Trade-off in Distributed Systems with Redundancy
Diversity/Parallelism Trade-off in Distributed Systems with Redundancy
Pei Peng
E. Soljanin
P. Whiting
14
13
0
05 Oct 2020
VAFL: a Method of Vertical Asynchronous Federated Learning
VAFL: a Method of Vertical Asynchronous Federated Learning
Tianyi Chen
Xiao Jin
Yuejiao Sun
W. Yin
FedML
12
159
0
12 Jul 2020
Anytime MiniBatch: Exploiting Stragglers in Online Distributed
  Optimization
Anytime MiniBatch: Exploiting Stragglers in Online Distributed Optimization
Nuwan S. Ferdinand
H. Al-Lawati
S. Draper
M. Nokleby
11
42
0
10 Jun 2020
A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent
A Distributed Multi-GPU System for Large-Scale Node Embedding at Tencent
Wanjing Wei
Yangzihao Wang
Ping Gao
Shijie Sun
Donghai Yu
GNN
9
7
0
28 May 2020
Breaking (Global) Barriers in Parallel Stochastic Optimization with
  Wait-Avoiding Group Averaging
Breaking (Global) Barriers in Parallel Stochastic Optimization with Wait-Avoiding Group Averaging
Shigang Li
Tal Ben-Nun
Giorgi Nadiradze
Salvatore Di Girolamo
Nikoli Dryden
Dan Alistarh
Torsten Hoefler
21
14
0
30 Apr 2020
Dynamic backup workers for parallel machine learning
Dynamic backup workers for parallel machine learning
Chuan Xu
Giovanni Neglia
Nicola Sebastianelli
15
11
0
30 Apr 2020
Straggler-aware Distributed Learning: Communication Computation Latency
  Trade-off
Straggler-aware Distributed Learning: Communication Computation Latency Trade-off
Emre Ozfatura
S. Ulukus
Deniz Gunduz
12
42
0
10 Apr 2020
Machine Learning on Volatile Instances
Machine Learning on Volatile Instances
Xiaoxi Zhang
Jianyu Wang
Gauri Joshi
Carlee Joe-Wong
15
25
0
12 Mar 2020
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in
  the Presence of Stragglers
Adaptive Distributed Stochastic Gradient Descent for Minimizing Delay in the Presence of Stragglers
Serge Kas Hanna
Rawad Bitar
Parimal Parag
Venkateswara Dasari
S. E. Rouayheb
14
16
0
25 Feb 2020
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
  in Distributed SGD
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Jianyu Wang
Hao Liang
Gauri Joshi
22
33
0
21 Feb 2020
Reliable Distributed Clustering with Redundant Data Assignment
Reliable Distributed Clustering with Redundant Data Assignment
V. Gandikota
A. Mazumdar
A. S. Rawat
6
2
0
20 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
Advances and Open Problems in Federated Learning
Advances and Open Problems in Federated Learning
Peter Kairouz
H. B. McMahan
Brendan Avent
A. Bellet
M. Bennis
...
Zheng Xu
Qiang Yang
Felix X. Yu
Han Yu
Sen Zhao
FedML
AI4CE
74
6,079
0
10 Dec 2019
Stigmergic Independent Reinforcement Learning for Multi-Agent
  Collaboration
Stigmergic Independent Reinforcement Learning for Multi-Agent Collaboration
Xing Xu
Rongpeng Li
Zhifeng Zhao
Honggang Zhang
30
24
0
28 Nov 2019
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with
  Adaptive Learning Rates
Local AdaAlter: Communication-Efficient Stochastic Gradient Descent with Adaptive Learning Rates
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
Yanghua Peng
26
41
0
20 Nov 2019
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD
Rosa Candela
Giulio Franzese
Maurizio Filippone
Pietro Michiardi
18
1
0
21 Oct 2019
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low
  Overhead
SAFA: a Semi-Asynchronous Protocol for Fast Federated Learning with Low Overhead
A. Masullo
Ligang He
Toby Perrett
Rui Mao
Carsten Maple
Majid Mirmehdi
22
300
0
03 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow
  Momentum
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
12
200
0
01 Oct 2019
Distributed SGD Generalizes Well Under Asynchrony
Distributed SGD Generalizes Well Under Asynchrony
Jayanth Reddy Regatti
Gaurav Tendolkar
Yi Zhou
Abhishek Gupta
Yingbin Liang
FedML
6
7
0
29 Sep 2019
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima
  Selection in Asynchronous Training of Neural Networks?
At Stability's Edge: How to Adjust Hyperparameters to Preserve Minima Selection in Asynchronous Training of Neural Networks?
Niv Giladi
Mor Shpigel Nacson
Elad Hoffer
Daniel Soudry
20
22
0
26 Sep 2019
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
232
438
0
25 Sep 2019
Gap Aware Mitigation of Gradient Staleness
Gap Aware Mitigation of Gradient Staleness
Saar Barkai
Ido Hakimi
Assaf Schuster
9
23
0
24 Sep 2019
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with
  Chicle
Addressing Algorithmic Bottlenecks in Elastic Machine Learning with Chicle
Michael Kaufmann
K. Kourtis
Celestine Mendler-Dünner
Adrian Schüpbach
Thomas Parnell
8
0
0
11 Sep 2019
Distributed Equivalent Substitution Training for Large-Scale Recommender
  Systems
Distributed Equivalent Substitution Training for Large-Scale Recommender Systems
Haidong Rong
Yangzihao Wang
Feihu Zhou
Junjie Zhai
Haiyang Wu
...
Fan Li
Han Zhang
Yuekui Yang
Zhenyu Guo
Di Wang
OffRL
19
11
0
10 Sep 2019
Motivating Workers in Federated Learning: A Stackelberg Game Perspective
Motivating Workers in Federated Learning: A Stackelberg Game Perspective
Y. Sarikaya
Ozgur Ercetin
FedML
9
178
0
06 Aug 2019
Taming Momentum in a Distributed Asynchronous Environment
Taming Momentum in a Distributed Asynchronous Environment
Ido Hakimi
Saar Barkai
Moshe Gabel
Assaf Schuster
11
23
0
26 Jul 2019
Robust and Communication-Efficient Collaborative Learning
Robust and Communication-Efficient Collaborative Learning
Amirhossein Reisizadeh
Hossein Taheri
Aryan Mokhtari
Hamed Hassani
Ramtin Pedarsani
25
89
0
24 Jul 2019
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Making Asynchronous Stochastic Gradient Descent Work for Transformers
Alham Fikri Aji
Kenneth Heafield
21
13
0
08 Jun 2019
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition
  Sampling
MATCHA: Speeding Up Decentralized SGD via Matching Decomposition Sampling
Jianyu Wang
Anit Kumar Sahu
Zhouyi Yang
Gauri Joshi
S. Kar
29
159
0
23 May 2019
LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and
  Communication-Efficient Distributed Learning
LAGC: Lazily Aggregated Gradient Coding for Straggler-Tolerant and Communication-Efficient Distributed Learning
Jingjing Zhang
Osvaldo Simeone
18
31
0
22 May 2019
Zeno++: Robust Fully Asynchronous SGD
Zeno++: Robust Fully Asynchronous SGD
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
FedML
11
106
0
17 Mar 2019
Speeding up Deep Learning with Transient Servers
Speeding up Deep Learning with Transient Servers
Shijian Li
R. Walls
Lijie Xu
Tian Guo
27
12
0
28 Feb 2019
Gradient Scheduling with Global Momentum for Non-IID Data Distributed Asynchronous Training
Chengjie Li
Ruixuan Li
Yining Qi
Yuhua Li
Pan Zhou
Song Guo
Keqin Li
27
15
0
21 Feb 2019
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient
  Descent Over-the-Air
Machine Learning at the Wireless Edge: Distributed Stochastic Gradient Descent Over-the-Air
Mohammad Mohammadi Amiri
Deniz Gunduz
30
53
0
03 Jan 2019
Distributed Gradient Descent with Coded Partial Gradient Computations
Distributed Gradient Descent with Coded Partial Gradient Computations
Emre Ozfatura
S. Ulukus
Deniz Gunduz
14
40
0
22 Nov 2018
Computation Scheduling for Distributed Machine Learning with Straggling
  Workers
Computation Scheduling for Distributed Machine Learning with Straggling Workers
Mohammad Mohammadi Amiri
Deniz Gunduz
FedML
11
2
0
23 Oct 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
33
231
0
19 Oct 2018
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent
  Stragglers
Speeding Up Distributed Gradient Descent by Utilizing Non-persistent Stragglers
Emre Ozfatura
Deniz Gunduz
S. Ulukus
16
76
0
07 Aug 2018
Previous
123
Next