ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.08133
  4. Cited By
Trade-offs of Local SGD at Scale: An Empirical Study

Trade-offs of Local SGD at Scale: An Empirical Study

15 October 2021
Jose Javier Gonzalez Ortiz
Jonathan Frankle
Michael G. Rabbat
Ari S. Morcos
Nicolas Ballas
    FedML
ArXivPDFHTML

Papers citing "Trade-offs of Local SGD at Scale: An Empirical Study"

19 / 19 papers shown
Title
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training
Hiroki Naganuma
Xinzhi Zhang
Man-Chung Yue
Ioannis Mitliagkas
Philipp A. Witte
Russell J. Hewett
Yin Tat Lee
63
0
0
25 Apr 2025
Photon: Federated LLM Pre-Training
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
36
7
0
05 Nov 2024
DEPT: Decoupled Embeddings for Pre-training Language Models
DEPT: Decoupled Embeddings for Pre-training Language Models
Alex Iacob
Lorenzo Sani
Meghdad Kurmanji
William F. Shen
Xinchi Qiu
Dongqi Cai
Yan Gao
Nicholas D. Lane
VLM
145
0
0
07 Oct 2024
Can We Theoretically Quantify the Impacts of Local Updates on the
  Generalization Performance of Federated Learning?
Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?
Peizhong Ju
Haibo Yang
Jia Liu
Yingbin Liang
Ness B. Shroff
FedML
33
0
0
05 Sep 2024
Communication-Efficient Adaptive Batch Size Strategies for Distributed
  Local Gradient Methods
Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods
Tim Tsz-Kit Lau
Weijian Li
Chenwei Xu
Han Liu
Mladen Kolar
44
1
0
20 Jun 2024
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
ACCO: Accumulate While You Communicate for Communication-Overlapped Sharded LLM Training
Adel Nabli
Louis Fournier
Pierre Erbacher
Louis Serrano
Eugene Belilovsky
Edouard Oyallon
FedML
46
1
0
03 Jun 2024
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling,
  then Average
WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average
Louis Fournier
Adel Nabli
Masih Aminbeidokhti
M. Pedersoli
Eugene Belilovsky
Edouard Oyallon
MoMe
FedML
46
3
0
27 May 2024
The Future of Large Language Model Pre-training is Federated
The Future of Large Language Model Pre-training is Federated
Lorenzo Sani
Alexandru Iacob
Zeyu Cao
Bill Marino
Yan Gao
...
Wanru Zhao
William F. Shen
Preslav Aleksandrov
Xinchi Qiu
Nicholas D. Lane
AI4CE
35
13
0
17 May 2024
Asynchronous Local-SGD Training for Language Modeling
Asynchronous Local-SGD Training for Language Modeling
Bo Liu
Rachita Chhaparia
Arthur Douillard
Satyen Kale
Andrei A. Rusu
Jiajun Shen
Arthur Szlam
MarcÁurelio Ranzato
FedML
40
10
0
17 Jan 2024
Can We Learn Communication-Efficient Optimizers?
Can We Learn Communication-Efficient Optimizers?
Charles-Étienne Joseph
Benjamin Thérien
A. Moudgil
Boris Knyazev
Eugene Belilovsky
40
1
0
02 Dec 2023
DiLoCo: Distributed Low-Communication Training of Language Models
DiLoCo: Distributed Low-Communication Training of Language Models
Arthur Douillard
Qixuang Feng
Andrei A. Rusu
Rachita Chhaparia
Yani Donchev
A. Kuncoro
MarcÁurelio Ranzato
Arthur Szlam
Jiajun Shen
58
31
0
14 Nov 2023
A Quadratic Synchronization Rule for Distributed Deep Learning
A Quadratic Synchronization Rule for Distributed Deep Learning
Xinran Gu
Kaifeng Lyu
Sanjeev Arora
Jingzhao Zhang
Longbo Huang
54
1
0
22 Oct 2023
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
32
24
0
19 Oct 2022
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Compute-Efficient Deep Learning: Algorithmic Trends and Opportunities
Brian Bartoldson
B. Kailkhura
Davis W. Blalock
31
47
0
13 Oct 2022
Federated Optimization Algorithms with Random Reshuffling and Gradient
  Compression
Federated Optimization Algorithms with Random Reshuffling and Gradient Compression
Abdurakhmon Sadiev
Grigory Malinovsky
Eduard A. Gorbunov
Igor Sokolov
Ahmed Khaled
Konstantin Burlachenko
Peter Richtárik
FedML
16
21
0
14 Jun 2022
Selectivity considered harmful: evaluating the causal impact of class
  selectivity in DNNs
Selectivity considered harmful: evaluating the causal impact of class selectivity in DNNs
Matthew L. Leavitt
Ari S. Morcos
58
33
0
03 Mar 2020
Bag of Tricks for Image Classification with Convolutional Neural
  Networks
Bag of Tricks for Image Classification with Convolutional Neural Networks
Tong He
Zhi-Li Zhang
Hang Zhang
Zhongyue Zhang
Junyuan Xie
Mu Li
221
1,399
0
04 Dec 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
299
2,890
0
15 Sep 2016
Optimal Distributed Online Prediction using Mini-Batches
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
177
683
0
07 Dec 2010
1