Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.18454
Cited By
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
25 April 2025
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training"
28 / 28 papers shown
Title
Asynchronous Local-SGD Training for Language Modeling
Bo Liu
Rachita Chhaparia
Arthur Douillard
Satyen Kale
Andrei A. Rusu
Jiajun Shen
Arthur Szlam
MarcÁurelio Ranzato
FedML
90
11
0
17 Jan 2024
DiLoCo: Distributed Low-Communication Training of Language Models
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
Rachita Chhaparia
Yani Donchev
A. Kuncoro
MarcÁurelio Ranzato
Arthur Szlam
Jiajun Shen
144
40
0
14 Nov 2023
TinyStories: How Small Can Language Models Be and Still Speak Coherent English?
Ronen Eldan
Yuan-Fang Li
SyDa
LRM
92
267
0
12 May 2023
TimelyFL: Heterogeneity-aware Asynchronous Federated Learning with Adaptive Partial Training
Tuo Zhang
Lei Gao
Sunwoo Lee
Mi Zhang
Salman Avestimehr
FedML
96
30
0
14 Apr 2023
Why (and When) does Local SGD Generalize Better than SGD?
Xinran Gu
Kaifeng Lyu
Longbo Huang
Sanjeev Arora
70
28
0
02 Mar 2023
Personalized Federated Learning with Communication Compression
El Houcine Bergou
Konstantin Burlachenko
Aritra Dutta
Peter Richtárik
FedML
119
10
0
12 Sep 2022
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally!
Konstantin Mishchenko
Grigory Malinovsky
Sebastian U. Stich
Peter Richtárik
61
156
0
18 Feb 2022
Trade-offs of Local SGD at Scale: An Empirical Study
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
FedML
86
18
0
15 Oct 2021
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
133
35
0
04 Mar 2021
A Unified Theory of Decentralized SGD with Changing Topology and Local Updates
Anastasia Koloskova
Nicolas Loizou
Sadra Boreiri
Martin Jaggi
Sebastian U. Stich
FedML
95
517
0
23 Mar 2020
Adaptive Federated Optimization
Sashank J. Reddi
Zachary B. Charles
Manzil Zaheer
Zachary Garrett
Keith Rush
Jakub Konecný
Sanjiv Kumar
H. B. McMahan
FedML
192
1,458
0
29 Feb 2020
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization
Farzin Haddadpour
Mohammad Mahdi Kamani
M. Mahdavi
V. Cadambe
FedML
87
202
0
30 Oct 2019
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
109
315
0
02 Oct 2019
SlowMo: Improving Communication-Efficient Distributed SGD with Slow Momentum
Jianyu Wang
Vinayak Tantia
Nicolas Ballas
Michael G. Rabbat
91
201
0
01 Oct 2019
Asynchronous Federated Optimization
Cong Xie
Oluwasanmi Koyejo
Indranil Gupta
FedML
92
574
0
10 Mar 2019
Federated Optimization in Heterogeneous Networks
Tian Li
Anit Kumar Sahu
Manzil Zaheer
Maziar Sanjabi
Ameet Talwalkar
Virginia Smith
FedML
264
5,283
0
14 Dec 2018
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
123
432
0
22 Aug 2018
Negative Momentum for Improved Game Dynamics
Gauthier Gidel
Reyhane Askari Hemmat
Mohammad Pezeshki
Rémi Le Priol
Gabriel Huang
Simon Lacoste-Julien
Ioannis Mitliagkas
AI4CE
99
181
0
12 Jul 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
205
1,072
0
24 May 2018
Asynchronous Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Wei Zhang
Ce Zhang
Ji Liu
ODL
65
500
0
18 Oct 2017
Asynchronous Stochastic Gradient Descent with Delay Compensation
Shuxin Zheng
Qi Meng
Taifeng Wang
Wei Chen
Nenghai Yu
Zhiming Ma
Tie-Yan Liu
148
316
0
27 Sep 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
541
2,947
0
15 Sep 2016
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
463
17,727
0
17 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.4K
195,011
0
10 Dec 2015
Asynchronous Parallel Stochastic Gradient for Nonconvex Optimization
Xiangru Lian
Yijun Huang
Y. Li
Ji Liu
151
499
0
27 Jun 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.3K
150,586
0
22 Dec 2014
Deep learning with Elastic Averaging SGD
Sixin Zhang
A. Choromańska
Yann LeCun
FedML
112
611
0
20 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
1.8K
100,713
0
04 Sep 2014
1