ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.18454
  4. Cited By
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

25 April 2025
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
ArXiv (abs)PDFHTML

Papers citing "Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training"

28 / 28 papers shown
Title
Asynchronous Local-SGD Training for Language Modeling
Asynchronous Local-SGD Training for Language Modeling
Bo Liu
Rachita Chhaparia
Arthur Douillard
Satyen Kale
Andrei A. Rusu
Jiajun Shen
Arthur Szlam
MarcÁurelio Ranzato
FedML
90
11
0
17 Jan 2024
DiLoCo: Distributed Low-Communication Training of Language Models
DiLoCo: Distributed Low-Communication Training of Language Models
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
Rachita Chhaparia
Yani Donchev
A. Kuncoro
MarcÁurelio Ranzato
Arthur Szlam
Jiajun Shen
144
40
0
14 Nov 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent
  English?
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDaLRM
92
267
0
12 May 2023
TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with
  Adaptive Partial Training
TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training
Tuo Zhang
Lei Gao
Sunwoo Lee
Mi Zhang
Salman Avestimehr
FedML
96
30
0
14 Apr 2023
Why (and When) does Local SGD Generalize Better than SGD?
Why (and When) does Local SGD Generalize Better than SGD?
Xinran Gu
Kaifeng Lyu
Longbo Huang
Sanjeev Arora
70
28
0
02 Mar 2023
Personalized Federated Learning with Communication Compression
Personalized Federated Learning with Communication Compression
El Houcine Bergou
Konstantin Burlachenko
Aritra Dutta
Peter Richtárik
FedML
119
10
0
12 Sep 2022
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication
  Acceleration! Finally!
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
Konstantin Mishchenko
Grigory Malinovsky
Sebastian U. Stich
Peter Richtárik
61
156
0
18 Feb 2022
Trade-offs of Local SGD at Scale: An Empirical Study
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
86
18
0
15 Oct 2021
Moshpit SGD: Communication-Efficient Decentralized Training on
  Heterogeneous Unreliable Devices
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
133
35
0
04 Mar 2021
A Unified Theory of Decentralized SGD with Changing Topology and Local
  Updates
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Anastasia Koloskova
Nicolas Loizou
Sadra Boreiri
Martin Jaggi
Sebastian U. Stich
FedML
95
517
0
23 Mar 2020
Adaptive Federated Optimization
Adaptive Federated Optimization
Sashank J. Reddi
Zachary B. Charles
Manzil Zaheer
Zachary Garrett
Keith Rush
Jakub Konecný
Sanjiv Kumar
H. B. McMahan
FedML
192
1,458
0
29 Feb 2020
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive
  Synchronization
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
Farzin Haddadpour
Mohammad Mahdi Kamani
M. Mahdavi
V. Cadambe
FedML
87
202
0
30 Oct 2019
MLPerf Training Benchmark
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
109
315
0
02 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow
  Momentum
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
91
201
0
01 Oct 2019
Asynchronous Federated Optimization
Asynchronous Federated Optimization
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
FedML
92
574
0
10 Mar 2019
Federated Optimization in Heterogeneous Networks
Federated Optimization in Heterogeneous Networks
Tian Li
Anit Kumar Sahu
Manzil Zaheer
Maziar Sanjabi
Ameet Talwalkar
Virginia Smith
FedML
264
5,283
0
14 Dec 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
123
432
0
22 Aug 2018
Negative Momentum for Improved Game Dynamics
Negative Momentum for Improved Game Dynamics
Gauthier Gidel
Reyhane Askari Hemmat
Mohammad Pezeshki
Rémi Le Priol
Gabriel Huang
Simon Lacoste-Julien
Ioannis Mitliagkas
AI4CE
99
181
0
12 Jul 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
205
1,072
0
24 May 2018
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Wei Zhang
Ce Zhang
Ji Liu
ODL
65
500
0
18 Oct 2017
Asynchronous Stochastic Gradient Descent with Delay Compensation
Asynchronous Stochastic Gradient Descent with Delay Compensation
Shuxin Zheng
Qi Meng
Taifeng Wang
Wei Chen
Nenghai Yu
Zhiming Ma
Tie-Yan Liu
148
316
0
27 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
541
2,947
0
15 Sep 2016
Communication-Efficient Learning of Deep Networks from Decentralized
  Data
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
463
17,727
0
17 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.4K
195,011
0
10 Dec 2015
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Xiangru Lian
Yijun Huang
Y. Li
Ji Liu
151
499
0
27 Jun 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.3K
150,586
0
22 Dec 2014
Deep learning with Elastic Averaging SGD
Deep learning with Elastic Averaging SGD
Sixin Zhang
A. Choromańska
Yann LeCun
FedML
112
611
0
20 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAttMDE
1.8K
100,713
0
04 Sep 2014
1