Parallel SGD: When does averaging help?

23 June 2016

Papers citing "Parallel SGD: When does averaging help?"

50 / 67 papers shown

Title
EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language Models Jialiang Cheng Ning Gao Yun Yue Zhiling Ye Jiadi Jiang Jian Sha OffRL 77 0 0 10 Dec 2024
FedAQ: Communication-Efficient Federated Edge Learning via Joint Uplink and Downlink Adaptive Quantization Linping Qu Shenghui Song Chi-Ying Tsui MQ FedML 18 4 0 26 Jun 2024
Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging Michail Theologitis Georgios Frangias Georgios Anestis V. Samoladas Antonios Deligiannakis FedML 37 0 0 31 May 2024
The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication Kumar Kshitij Patel Margalit Glasgow Ali Zindari Lingxiao Wang Sebastian U. Stich Ziheng Cheng Nirmit Joshi Nathan Srebro 48 6 0 19 May 2024
Training Neural Networks from Scratch with Parallel Low-Rank Adapters Minyoung Huh Brian Cheung Jeremy Bernstein Phillip Isola Pulkit Agrawal 35 10 0 26 Feb 2024
CO2: Efficient Distributed Training with Full Communication-Computation Overlap Weigao Sun Zhen Qin Weixuan Sun Shidi Li Dong Li Xuyang Shen Yu Qiao Yiran Zhong OffRL 58 10 0 29 Jan 2024
Asynchronous Local-SGD Training for Language Modeling Bo Liu Rachita Chhaparia Arthur Douillard Satyen Kale Andrei A. Rusu Jiajun Shen Arthur Szlam MarcÁurelio Ranzato FedML 40 10 0 17 Jan 2024
Mini-batch Gradient Descent with Buffer Haobo Qi Du Huang Yingqiu Zhu Danyang Huang Hansheng Wang 23 1 0 14 Dec 2023
Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to Beat Erdong Hu Yu-Shuen Tang Anastasios Kyrillidis C. Jermaine FedML 33 10 0 06 Sep 2023
FedDec: Peer-to-peer Aided Federated Learning Marina Costantini Giovanni Neglia T. Spyropoulos FedML 12 1 0 11 Jun 2023
Understanding and Improving Model Averaging in Federated Learning on Heterogeneous Data Tailin Zhou Zehong Lin Jinchao Zhang Danny H. K. Tsang MoMe FedML 35 12 0 13 May 2023
Hierarchical Weight Averaging for Deep Neural Networks Xiaozhe Gu Zixun Zhang Yuncheng Jiang Tao Luo Ruimao Zhang Shuguang Cui Zhuguo Li 21 5 0 23 Apr 2023
Accelerating Hybrid Federated Learning Convergence under Partial Participation Jieming Bian Lei Wang Kun Yang Cong Shen Jie Xu FedML 20 11 0 10 Apr 2023
ABS: Adaptive Bounded Staleness Converges Faster and Communicates Less Qiao Tan Feng Zhu Jingjing Zhang 47 0 0 21 Jan 2023
On the Performance of Gradient Tracking with Local Updates Edward Duc Hien Nguyen Sulaiman A. Alghunaim Kun Yuan César A. Uribe 37 19 0 10 Oct 2022
Parallel and Streaming Wavelet Neural Networks for Classification and Regression under Apache Spark E Venkatesh Yelleti Vivek V. Ravi Shiva Shankar Orsu 11 6 0 07 Sep 2022
Distributed Evolution Strategies for Black-box Stochastic Optimization Xiaoyu He Zibin Zheng Chuan Chen Yuren Zhou Chuan Luo Qingwei Lin 24 4 0 09 Apr 2022
Scaling the Wild: Decentralizing Hogwild!-style Shared-memory SGD Bapi Chatterjee Vyacheslav Kungurtsev Dan Alistarh FedML 19 2 0 13 Mar 2022
ProxSkip: Yes! Local Gradient Steps Provably Lead to Communication Acceleration! Finally! Konstantin Mishchenko Grigory Malinovsky Sebastian U. Stich Peter Richtárik 11 149 0 18 Feb 2022
On the Convergence of Shallow Neural Network Training with Randomly Masked Neurons Fangshuo Liao Anastasios Kyrillidis 38 16 0 05 Dec 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey Xiaoxin He Fuzhao Xue Xiaozhe Ren Yang You 27 14 0 01 Nov 2021
Trade-offs of Local SGD at Scale: An Empirical Study Jose Javier Gonzalez Ortiz Jonathan Frankle Michael G. Rabbat Ari S. Morcos Nicolas Ballas FedML 37 19 0 15 Oct 2021
Local SGD Optimizes Overparameterized Neural Networks in Polynomial Time Yuyang Deng Mohammad Mahdi Kamani M. Mahdavi FedML 11 14 0 22 Jul 2021
ResIST: Layer-Wise Decomposition of ResNets for Distributed Training Chen Dun Cameron R. Wolfe C. Jermaine Anastasios Kyrillidis 16 21 0 02 Jul 2021
Communication-efficient SGD: From Local SGD to One-Shot Averaging Artin Spiridonoff Alexander Olshevsky I. Paschalidis FedML 29 20 0 09 Jun 2021
Accelerating Gossip SGD with Periodic Global Averaging Yiming Chen Kun Yuan Yingya Zhang Pan Pan Yinghui Xu W. Yin 29 41 0 19 May 2021
CrossoverScheduler: Overlapping Multiple Distributed Training Applications in a Crossover Manner Cheng Luo L. Qu Youshan Miao Peng Cheng Y. Xiong 16 0 0 14 Mar 2021
FedDR -- Randomized Douglas-Rachford Splitting Algorithms for Nonconvex Federated Composite Optimization Quoc Tran-Dinh Nhan H. Pham Dzung Phan Lam M. Nguyen FedML 16 39 0 05 Mar 2021
Oscars: Adaptive Semi-Synchronous Parallel Model for Distributed Deep Learning with Global View Sheng-Jun Huang 16 0 0 17 Feb 2021
Truly Sparse Neural Networks at Scale Selima Curci D. Mocanu Mykola Pechenizkiy 30 19 0 02 Feb 2021
FedSKETCH: Communication-Efficient and Private Federated Learning via Sketching Farzin Haddadpour Belhal Karimi Ping Li Xiaoyun Li FedML 50 31 0 11 Aug 2020
Multi-Level Local SGD for Heterogeneous Hierarchical Networks Timothy Castiglia Anirban Das S. Patterson 18 13 0 27 Jul 2020
DBS: Dynamic Batch Size For Distributed Deep Neural Network Training Qing Ye Yuhao Zhou Mingjia Shi Yanan Sun Jiancheng Lv 14 11 0 23 Jul 2020
Adaptive Periodic Averaging: A Practical Approach to Reducing Communication in Distributed Learning Peng Jiang G. Agrawal 25 5 0 13 Jul 2020
Attack of the Tails: Yes, You Really Can Backdoor Federated Learning Hongyi Wang Kartik K. Sreenivasan Shashank Rajput Harit Vishwakarma Saurabh Agarwal Jy-yong Sohn Kangwook Lee Dimitris Papailiopoulos FedML 18 589 0 09 Jul 2020
Federated Learning with Compression: Unified Analysis and Sharp Guarantees Farzin Haddadpour Mohammad Mahdi Kamani Aryan Mokhtari M. Mahdavi FedML 33 271 0 02 Jul 2020
STL-SGD: Speeding Up Local SGD with Stagewise Communication Period Shuheng Shen Yifei Cheng Jingchang Liu Linli Xu LRM 18 7 0 11 Jun 2020
Minibatch vs Local SGD for Heterogeneous Distributed Learning Blake E. Woodworth Kumar Kshitij Patel Nathan Srebro FedML 22 198 0 08 Jun 2020
Local SGD With a Communication Overhead Depending Only on the Number of Workers Artin Spiridonoff Alexander Olshevsky I. Paschalidis FedML 19 19 0 03 Jun 2020
Weighted Aggregating Stochastic Gradient Descent for Parallel Deep Learning Pengzhan Guo Zeyang Ye Keli Xiao Wei Zhu 16 14 0 07 Apr 2020
Differentially Private Federated Learning for Resource-Constrained Internet of Things Rui Hu Yuanxiong Guo E. Ratazzi Yanmin Gong FedML 25 17 0 28 Mar 2020
A Hybrid-Order Distributed SGD Method for Non-Convex Optimization to Balance Communication Overhead, Computational Complexity, and Convergence Rate Naeimeh Omidvar M. Maddah-ali Hamed Mahdavi ODL 17 3 0 27 Mar 2020
Communication-Efficient Distributed Deep Learning: A Comprehensive Survey Zhenheng Tang S. Shi Wei Wang Bo-wen Li Xiaowen Chu 21 48 0 10 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey Shuo Ouyang Dezun Dong Yemao Xu Liquan Xiao 27 12 0 06 Mar 2020
Is Local SGD Better than Minibatch SGD? Blake E. Woodworth Kumar Kshitij Patel Sebastian U. Stich Zhen Dai Brian Bullins H. B. McMahan Ohad Shamir Nathan Srebro FedML 34 253 0 18 Feb 2020
Stochastic Weight Averaging in Parallel: Large-Batch Training that Generalizes Well Vipul Gupta S. Serrano D. DeCoste MoMe 32 55 0 07 Jan 2020
Parallel Restarted SPIDER -- Communication Efficient Distributed Nonconvex Optimization with Optimal Computation Complexity Pranay Sharma Swatantra Kafle Prashant Khanduri Saikiran Bulusu K. Rajawat P. Varshney FedML 25 17 0 12 Dec 2019
On the Convergence of Local Descent Methods in Federated Learning Farzin Haddadpour M. Mahdavi FedML 19 266 0 31 Oct 2019
Local SGD with Periodic Averaging: Tighter Analysis and Adaptive Synchronization Farzin Haddadpour Mohammad Mahdi Kamani M. Mahdavi V. Cadambe FedML 27 199 0 30 Oct 2019
Sparsification as a Remedy for Staleness in Distributed Asynchronous SGD Rosa Candela Giulio Franzese Maurizio Filippone Pietro Michiardi 15 1 0 21 Oct 2019