A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

28 October 2018

Papers citing "A Hitchhiker's Guide On Distributed Training of Deep Neural Networks"

34 / 34 papers shown

Title
Communication optimization strategies for distributed deep neural network training: A survey Shuo Ouyang Dezun Dong Yemao Xu Liquan Xiao 87 12 0 06 Mar 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,175 0 11 Oct 2018
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes Xianyan Jia Shutao Song W. He Yangzihao Wang Haidong Rong ... Li Yu Tiegang Chen Guangxiao Hu Shaoshuai Shi Xiaowen Chu 77 384 0 30 Jul 2018
Quantizing deep convolutional networks for efficient inference: A whitepaper Raghuraman Krishnamoorthi MQ 141 1,021 0 21 Jun 2018
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay L. Smith 283 1,035 0 26 Mar 2018
Horovod: fast and easy distributed deep learning in TensorFlow Alexander Sergeev Mike Del Balso 100 1,221 0 15 Feb 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference Benoit Jacob S. Kligys Bo Chen Menglong Zhu Matthew Tang Andrew G. Howard Hartwig Adam Dmitry Kalenichenko MQ 162 3,141 0 15 Dec 2017
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training Chia-Yu Chen Jungwook Choi D. Brand A. Agrawal Wei Zhang K. Gopalakrishnan ODL 52 174 0 07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Chengyue Wu Song Han Huizi Mao Yu Wang W. Dally 148 1,409 0 05 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 103 996 0 01 Nov 2017
Mixed Precision Training Paulius Micikevicius Sharan Narang Jonah Alben G. Diamos Erich Elsen ... Boris Ginsburg Michael Houston Oleksii Kuchaiev Ganesh Venkatesh Hao Wu 174 1,805 0 10 Oct 2017
What does fault tolerant Deep Learning need from MPI? Vinay C. Amatya Abhinav Vishnu Charles Siegel J. Daily 61 19 0 11 Sep 2017
Large Batch Training of Convolutional Networks Yang You Igor Gitman Boris Ginsburg ODL 141 852 0 13 Aug 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 128 3,685 0 08 Jun 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning W. Wen Cong Xu Feng Yan Chunpeng Wu Yandan Wang Yiran Chen Hai Helen Li 163 990 0 22 May 2017
Sparse Communication for Distributed Gradient Descent Alham Fikri Aji Kenneth Heafield 76 741 0 17 Apr 2017
YOLO9000: Better, Faster, Stronger Joseph Redmon Ali Farhadi VLM ObjD 183 15,633 0 25 Dec 2016
How to scale distributed deep learning? Peter H. Jin Qiaochu Yuan F. Iandola Kurt Keutzer 3DH 62 137 0 14 Nov 2016
Federated Learning: Strategies for Improving Communication Efficiency Jakub Konecný H. B. McMahan Felix X. Yu Peter Richtárik A. Suresh Dave Bacon FedML 309 4,655 0 18 Oct 2016
Asynchronous Stochastic Gradient Descent with Delay Compensation Shuxin Zheng Qi Meng Taifeng Wang Wei Chen Nenghai Yu Zhiming Ma Tie-Yan Liu 109 315 0 27 Sep 2016
An overview of gradient descent optimization algorithms Sebastian Ruder ODL 206 6,202 0 15 Sep 2016
TensorFlow: A system for large-scale machine learning Martín Abadi P. Barham Jianmin Chen Zhiwen Chen Andy Davis ... Vijay Vasudevan Pete Warden Martin Wicke Yuan Yu Xiaoqiang Zhang GNN AI4CE 433 18,361 0 27 May 2016
Revisiting Distributed Synchronous SGD Jianmin Chen Xinghao Pan R. Monga Samy Bengio Rafal Jozefowicz 87 801 0 04 Apr 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,426 0 10 Dec 2015
Staleness-aware Async-SGD for Distributed Deep Learning Wei Zhang Suyog Gupta Xiangru Lian Ji Liu 75 266 0 18 Nov 2015
Cyclical Learning Rates for Training Neural Networks L. Smith ODL 215 2,537 0 03 Jun 2015
LSTM: A Search Space Odyssey Klaus Greff R. Srivastava Jan Koutník Bas R. Steunebrink Jürgen Schmidhuber AI4TS VLM 127 5,309 0 13 Mar 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.0K 150,312 0 22 Dec 2014
Deep learning with Elastic Averaging SGD Sixin Zhang A. Choromańska Yann LeCun FedML 96 611 0 20 Dec 2014
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 1.7K 39,595 0 01 Sep 2014
Caffe: Convolutional Architecture for Fast Feature Embedding Yangqing Jia Evan Shelhamer Jeff Donahue Sergey Karayev Jonathan Long Ross B. Girshick S. Guadarrama Trevor Darrell VLM BDL 3DV 280 14,712 0 20 Jun 2014
Deep Learning in Neural Networks: An Overview Jürgen Schmidhuber HAI 246 16,377 0 30 Apr 2014
Playing Atari with Deep Reinforcement Learning Volodymyr Mnih Koray Kavukcuoglu David Silver Alex Graves Ioannis Antonoglou Daan Wierstra Martin Riedmiller 129 12,265 0 19 Dec 2013
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent Feng Niu Benjamin Recht Christopher Ré Stephen J. Wright 201 2,274 0 28 Jun 2011