Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.01215
Cited By
v1
v2 (latest)
Why (and When) does Local SGD Generalize Better than SGD?
2 March 2023
Xinran Gu
Kaifeng Lyu
Longbo Huang
Sanjeev Arora
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Why (and When) does Local SGD Generalize Better than SGD?"
5 / 5 papers shown
Title
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
256
0
0
25 Apr 2025
Revisiting LocalSGD and SCAFFOLD: Improved Rates and Missing Analysis
Ruichen Luo
Sebastian U Stich
Samuel Horváth
Martin Takáč
139
0
0
08 Jan 2025
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models
Jialiang Cheng
Ning Gao
Yun Yue
Zhiling Ye
Jiadi Jiang
Jian Sha
OffRL
158
1
0
10 Dec 2024
The Marginal Value of Momentum for Small Learning Rate SGD
Runzhe Wang
Sadhika Malladi
Tianhao Wang
Kaifeng Lyu
Zhiyuan Li
ODL
82
9
0
27 Jul 2023
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
154
15
0
05 Jun 2023
1