ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.11787
  4. Cited By
A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

A Hitchhiker's Guide On Distributed Training of Deep Neural Networks

28 October 2018
K. Chahal
Manraj Singh Grover
Kuntal Dey
    3DHOOD
ArXiv (abs)PDFHTML

Papers citing "A Hitchhiker's Guide On Distributed Training of Deep Neural Networks"

34 / 34 papers shown
Title
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
87
12
0
06 Mar 2020
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,175
0
11 Oct 2018
Highly Scalable Deep Learning Training System with Mixed-Precision:
  Training ImageNet in Four Minutes
Highly Scalable Deep Learning Training System with Mixed-Precision: Training ImageNet in Four Minutes
Xianyan Jia
Shutao Song
W. He
Yangzihao Wang
Haidong Rong
...
Li Yu
Tiegang Chen
Guangxiao Hu
Shaoshuai Shi
Xiaowen Chu
77
384
0
30 Jul 2018
Quantizing deep convolutional networks for efficient inference: A
  whitepaper
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi
MQ
141
1,021
0
21 Jun 2018
A disciplined approach to neural network hyper-parameters: Part 1 --
  learning rate, batch size, momentum, and weight decay
A disciplined approach to neural network hyper-parameters: Part 1 -- learning rate, batch size, momentum, and weight decay
L. Smith
283
1,035
0
26 Mar 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
100
1,221
0
15 Feb 2018
Quantization and Training of Neural Networks for Efficient
  Integer-Arithmetic-Only Inference
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
162
3,141
0
15 Dec 2017
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
  Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
52
174
0
07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
148
1,409
0
05 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
103
996
0
01 Nov 2017
Mixed Precision Training
Mixed Precision Training
Paulius Micikevicius
Sharan Narang
Jonah Alben
G. Diamos
Erich Elsen
...
Boris Ginsburg
Michael Houston
Oleksii Kuchaiev
Ganesh Venkatesh
Hao Wu
174
1,805
0
10 Oct 2017
What does fault tolerant Deep Learning need from MPI?
What does fault tolerant Deep Learning need from MPI?
Vinay C. Amatya
Abhinav Vishnu
Charles Siegel
J. Daily
61
19
0
11 Sep 2017
Large Batch Training of Convolutional Networks
Large Batch Training of Convolutional Networks
Yang You
Igor Gitman
Boris Ginsburg
ODL
141
852
0
13 Aug 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
163
990
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
76
741
0
17 Apr 2017
YOLO9000: Better, Faster, Stronger
YOLO9000: Better, Faster, Stronger
Joseph Redmon
Ali Farhadi
VLMObjD
183
15,633
0
25 Dec 2016
How to scale distributed deep learning?
How to scale distributed deep learning?
Peter H. Jin
Qiaochu Yuan
F. Iandola
Kurt Keutzer
3DH
62
137
0
14 Nov 2016
Federated Learning: Strategies for Improving Communication Efficiency
Federated Learning: Strategies for Improving Communication Efficiency
Jakub Konecný
H. B. McMahan
Felix X. Yu
Peter Richtárik
A. Suresh
Dave Bacon
FedML
309
4,655
0
18 Oct 2016
Asynchronous Stochastic Gradient Descent with Delay Compensation
Asynchronous Stochastic Gradient Descent with Delay Compensation
Shuxin Zheng
Qi Meng
Taifeng Wang
Wei Chen
Nenghai Yu
Zhiming Ma
Tie-Yan Liu
109
315
0
27 Sep 2016
An overview of gradient descent optimization algorithms
An overview of gradient descent optimization algorithms
Sebastian Ruder
ODL
206
6,202
0
15 Sep 2016
TensorFlow: A system for large-scale machine learning
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNNAI4CE
433
18,361
0
27 May 2016
Revisiting Distributed Synchronous SGD
Revisiting Distributed Synchronous SGD
Jianmin Chen
Xinghao Pan
R. Monga
Samy Bengio
Rafal Jozefowicz
87
801
0
04 Apr 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,426
0
10 Dec 2015
Staleness-aware Async-SGD for Distributed Deep Learning
Staleness-aware Async-SGD for Distributed Deep Learning
Wei Zhang
Suyog Gupta
Xiangru Lian
Ji Liu
75
266
0
18 Nov 2015
Cyclical Learning Rates for Training Neural Networks
Cyclical Learning Rates for Training Neural Networks
L. Smith
ODL
215
2,537
0
03 Jun 2015
LSTM: A Search Space Odyssey
LSTM: A Search Space Odyssey
Klaus Greff
R. Srivastava
Jan Koutník
Bas R. Steunebrink
Jürgen Schmidhuber
AI4TSVLM
127
5,309
0
13 Mar 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,312
0
22 Dec 2014
Deep learning with Elastic Averaging SGD
Deep learning with Elastic Averaging SGD
Sixin Zhang
A. Choromańska
Yann LeCun
FedML
96
611
0
20 Dec 2014
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLMObjD
1.7K
39,595
0
01 Sep 2014
Caffe: Convolutional Architecture for Fast Feature Embedding
Caffe: Convolutional Architecture for Fast Feature Embedding
Yangqing Jia
Evan Shelhamer
Jeff Donahue
Sergey Karayev
Jonathan Long
Ross B. Girshick
S. Guadarrama
Trevor Darrell
VLMBDL3DV
280
14,712
0
20 Jun 2014
Deep Learning in Neural Networks: An Overview
Deep Learning in Neural Networks: An Overview
Jürgen Schmidhuber
HAI
246
16,377
0
30 Apr 2014
Playing Atari with Deep Reinforcement Learning
Playing Atari with Deep Reinforcement Learning
Volodymyr Mnih
Koray Kavukcuoglu
David Silver
Alex Graves
Ioannis Antonoglou
Daan Wierstra
Martin Riedmiller
129
12,265
0
19 Dec 2013
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient
  Descent
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Feng Niu
Benjamin Recht
Christopher Ré
Stephen J. Wright
201
2,274
0
28 Jun 2011
1