Deep learning with Elastic Averaging SGD

Deep learning with Elastic Averaging SGD

20 December 2014

A. Choromańska

Papers citing "Deep learning with Elastic Averaging SGD"

9 / 9 papers shown

Title
Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training Hiroki Naganuma Xinzhi Zhang Man-Chung Yue Ioannis Mitliagkas Philipp A. Witte Russell J. Hewett Yin Tat Lee 126 0 0 25 Apr 2025
No Need to Talk: Asynchronous Mixture of Language Models Anastasiia Filippova Angelos Katharopoulos David Grangier Ronan Collobert MoE 54 0 0 04 Oct 2024
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction Lei Guan Dongsheng Li Jiye Liang Wenjian Wang Wenjian Wang Xicheng Lu 56 1 0 01 Dec 2023
Weighted Averaged Stochastic Gradient Descent: Asymptotic Normality and Optimality Ziyang Wei Wanrong Zhu Wei Biao Wu 44 5 0 13 Jul 2023
Integrated Model, Batch and Domain Parallelism in Training Neural Networks A. Gholami A. Azad Peter H. Jin Kurt Keutzer A. Buluç 58 83 0 12 Dec 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks Elad Hoffer Itay Hubara Daniel Soudry ODL 138 798 0 24 May 2017
Asynchronous Stochastic Gradient Descent with Delay Compensation Shuxin Zheng Qi Meng Taifeng Wang Wei Chen Nenghai Yu Zhiming Ma Tie-Yan Liu 80 313 0 27 Sep 2016
Distributed Bayesian Learning with Stochastic Natural-gradient Expectation Propagation and the Posterior Server Leonard Hasenclever Stefan Webb Thibaut Lienart Sebastian J. Vollmer Balaji Lakshminarayanan Charles Blundell Yee Whye Teh BDL 77 70 0 31 Dec 2015
GPU Asynchronous Stochastic Gradient Descent to Speed Up Neural Network Training T. Paine Hailin Jin Jianchao Yang Zhe Lin Thomas Huang 67 98 0 21 Dec 2013