AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
Distributed Training

AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training

7 December 2017

K. Gopalakrishnan

ArXiv (abs)PDF HTML

Papers citing "AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training"

14 / 64 papers shown

Title
Associative Convolutional Layers H. Omidvar Vahideh Akhlaghi M. Franceschetti Rajesh K. Gupta 44 1 0 10 Jun 2019
DoubleSqueeze: Parallel Stochastic Gradient Descent with Double-Pass Error-Compensated Compression Hanlin Tang Xiangru Lian Chen Yu Tong Zhang Ji Liu 91 219 0 15 May 2019
Priority-based Parameter Propagation for Distributed DNN Training Anand Jayarajan Jinliang Wei Garth A. Gibson Alexandra Fedorova Gennady Pekhimenko AI4CE 62 182 0 10 May 2019
Realizing Petabyte Scale Acoustic Modeling S. Parthasarathi Nitin Sivakrishnan Pranav Ladkat N. Strom 60 11 0 24 Apr 2019
Distributed Deep Learning Strategies For Automatic Speech Recognition Wei Zhang Xiaodong Cui Ulrich Finkler Brian Kingsbury G. Saon David S. Kung M. Picheny 70 29 0 10 Apr 2019
A Distributed Synchronous SGD Algorithm with Global Top- $k$ Sparsification for Low Bandwidth Networks Shaoshuai Shi Qiang-qiang Wang Kaiyong Zhao Zhenheng Tang Yuxin Wang Xiang Huang Xiaowen Chu 79 137 0 14 Jan 2019
Pipe-SGD: A Decentralized Pipelined SGD Framework for Distributed Deep Net Training Youjie Li Hang Qiu Songze Li A. Avestimehr Nam Sung Kim Alex Schwing FedML 118 104 0 08 Nov 2018
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks K. Chahal Manraj Singh Grover Kuntal Dey 3DH OOD 88 54 0 28 Oct 2018
Computation Scheduling for Distributed Machine Learning with Straggling Workers Mohammad Mohammadi Amiri Deniz Gunduz FedML 82 3 0 23 Oct 2018
Sparsified SGD with Memory Sebastian U. Stich Jean-Baptiste Cordonnier Martin Jaggi 106 753 0 20 Sep 2018
RedSync : Reducing Synchronization Traffic for Distributed Deep Learning Jiarui Fang Haohuan Fu Guangwen Yang Cho-Jui Hsieh GNN 106 25 0 13 Aug 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification Hongyi Wang Scott Sievert Zachary B. Charles Shengchao Liu S. Wright Dimitris Papailiopoulos 95 356 0 11 Jun 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 82 711 0 26 Feb 2018
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Chengyue Wu Song Han Huizi Mao Yu Wang W. Dally 191 1,413 0 05 Dec 2017