ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1711.05979
  4. Cited By
Performance Modeling and Evaluation of Distributed Deep Learning
  Frameworks on GPUs

Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs

16 November 2017
Shaoshuai Shi
Qiang-qiang Wang
Xiaowen Chu
ArXivPDFHTML

Papers citing "Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs"

20 / 20 papers shown
Title
FedSlate:A Federated Deep Reinforcement Learning Recommender System
FedSlate:A Federated Deep Reinforcement Learning Recommender System
Yongxin Deng
Xihe Qiu
Jue Chen
Yaochu Jin
FedML
110
0
0
23 Sep 2024
Stochastic Nonconvex Optimization with Large Minibatches
Stochastic Nonconvex Optimization with Large Minibatches
Weiran Wang
Nathan Srebro
67
26
0
25 Sep 2017
ImageNet Training in Minutes
ImageNet Training in Minutes
Yang You
Zhao-jie Zhang
Cho-Jui Hsieh
J. Demmel
Kurt Keutzer
VLM
LRM
55
57
0
14 Sep 2017
Distributed Training Large-Scale Deep Architectures
Distributed Training Large-Scale Deep Architectures
Shang-Xuan Zou
Chun-Yen Chen
Jui-Lin Wu
Chun-Nan Chou
Chia-Chin Tsao
Kuan-Chieh Tung
Ting-Wei Lin
Cheng-Lung Sung
Edward Y. Chang
45
22
0
10 Aug 2017
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand
  Clusters: MPI or NCCL?
Optimized Broadcast for Deep Learning Workloads on Dense-GPU InfiniBand Clusters: MPI or NCCL?
A. A. Awan
Ching-Hsiang Chu
Hari Subramoni
D. Panda
GNN
64
46
0
28 Jul 2017
Poseidon: An Efficient Communication Architecture for Distributed Deep
  Learning on GPU Clusters
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
Huatian Zhang
Zeyu Zheng
Shizhen Xu
Wei-Ming Dai
Qirong Ho
Xiaodan Liang
Zhiting Hu
Jinliang Wei
P. Xie
Eric Xing
GNN
67
345
0
11 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
126
3,678
0
08 Jun 2017
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case
  Study for Decentralized Parallel Stochastic Gradient Descent
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Ce Zhang
Huan Zhang
Cho-Jui Hsieh
Wei Zhang
Ji Liu
50
1,227
0
25 May 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
336
4,625
0
10 Nov 2016
Benchmarking State-of-the-Art Deep Learning Software Tools
Benchmarking State-of-the-Art Deep Learning Software Tools
Shaoshuai Shi
Qiang-qiang Wang
Pengfei Xu
Xiaowen Chu
BDL
54
330
0
25 Aug 2016
Revisiting Distributed Synchronous SGD
Revisiting Distributed Synchronous SGD
Jianmin Chen
Xinghao Pan
R. Monga
Samy Bengio
Rafal Jozefowicz
87
799
0
04 Apr 2016
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Distributed Deep Learning Using Synchronous Stochastic Gradient Descent
Dipankar Das
Sasikanth Avancha
Dheevatsa Mudigere
K. Vaidyanathan
Srinivas Sridharan
Dhiraj D. Kalamkar
Bharat Kaul
Pradeep Dubey
GNN
53
170
0
22 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
193,878
0
10 Dec 2015
MXNet: A Flexible and Efficient Machine Learning Library for
  Heterogeneous Distributed Systems
MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems
Tianqi Chen
Mu Li
Yutian Li
Min Lin
Naiyan Wang
Minjie Wang
Tianjun Xiao
Bing Xu
Chiyuan Zhang
Zheng Zhang
196
2,246
0
03 Dec 2015
Comparative Study of Deep Learning Software Frameworks
Comparative Study of Deep Learning Software Frameworks
S. Bahrampour
Naveen Ramakrishnan
Lukas Schott
Mohak Shah
57
161
0
19 Nov 2015
Fast Algorithms for Convolutional Neural Networks
Fast Algorithms for Convolutional Neural Networks
Andrew Lavin
Scott Gray
72
877
0
30 Sep 2015
The Potential of the Intel Xeon Phi for Supervised Deep Learning
The Potential of the Intel Xeon Phi for Supervised Deep Learning
Andre Viebke
Sabri Pllana
BDL
44
39
0
30 Jun 2015
cuDNN: Efficient Primitives for Deep Learning
cuDNN: Efficient Primitives for Deep Learning
Sharan Chetlur
Cliff Woolley
Philippe Vandermersch
Jonathan M. Cohen
J. Tran
Bryan Catanzaro
Evan Shelhamer
127
1,846
0
03 Oct 2014
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
457
43,649
0
17 Sep 2014
Fast Training of Convolutional Networks through FFTs
Fast Training of Convolutional Networks through FFTs
Michaël Mathieu
Mikael Henaff
Yann LeCun
121
610
0
20 Dec 2013
1