TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
Learning

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

22 May 2017

Yiran Chen

Papers citing "TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning"

17 / 467 papers shown

Title
MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server Guoxin Cui Jun Xu Wei Zeng Yanyan Lan Jiafeng Guo Xueqi Cheng MQ 8 13 0 22 Apr 2018
Toolflows for Mapping Convolutional Neural Networks on FPGAs: A Survey and Future Directions Stylianos I. Venieris Alexandros Kouris C. Bouganis 19 184 0 15 Mar 2018
Deep Learning in Mobile and Wireless Networking: A Survey Chaoyun Zhang P. Patras Hamed Haddadi 50 1,306 0 12 Mar 2018
TicTac: Accelerating Distributed Deep Learning with Communication Scheduling Sayed Hadi Hashemi Sangeetha Abdu Jyothi R. Campbell 13 196 0 08 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 33 704 0 26 Feb 2018
SparCML: High-Performance Sparse Communication for Machine Learning Cédric Renggli Saleh Ashkboos Mehdi Aghagolzadeh Dan Alistarh Torsten Hoefler 29 126 0 22 Feb 2018
3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning Hyeontaek Lim D. Andersen M. Kaminsky 21 70 0 21 Feb 2018
Variance-based Gradient Compression for Efficient Distributed Deep Learning Yusuke Tsuzuku H. Imachi Takuya Akiba FedML 21 81 0 16 Feb 2018
Training and Inference with Integers in Deep Neural Networks Shuang Wu Guoqi Li F. Chen Luping Shi MQ 41 389 0 13 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems Jeremy Bernstein Yu Wang Kamyar Azizzadenesheli Anima Anandkumar FedML ODL 44 1,021 0 13 Feb 2018
Communication-Computation Efficient Gradient Coding Min Ye Emmanuel Abbe 26 158 0 09 Feb 2018
Distributed Deep Reinforcement Learning: Learn how to play Atari games in 21 minutes Igor Adamski R. Adamski T. Grel Adam Jedrych Kamil Kaczmarek Henryk Michalewski OffRL 41 37 0 09 Jan 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training Chia-Yu Chen Jungwook Choi D. Brand A. Agrawal Wei Zhang K. Gopalakrishnan ODL 18 173 0 07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Chengyue Wu Song Han Huizi Mao Yu Wang W. Dally 65 1,388 0 05 Dec 2017
Gradient Sparsification for Communication-Efficient Distributed Optimization Jianqiao Wangni Jialei Wang Ji Liu Tong Zhang 15 522 0 26 Oct 2017
Randomized Distributed Mean Estimation: Accuracy vs Communication Jakub Konecný Peter Richtárik FedML 33 101 0 22 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 312 2,896 0 15 Sep 2016