Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1706.03292
Cited By
Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters
11 June 2017
Huan Zhang
Zeyu Zheng
Shizhen Xu
Wei-Ming Dai
Qirong Ho
Xiaodan Liang
Zhiting Hu
Jinliang Wei
P. Xie
Eric P. Xing
GNN
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Poseidon: An Efficient Communication Architecture for Distributed Deep Learning on GPU Clusters"
41 / 41 papers shown
Title
Asynchronous Stochastic Gradient Descent with Decoupled Backpropagation and Layer-Wise Updates
Cabrel Teguemne Fokam
Khaleelulla Khan Nazeer
Lukas König
David Kappel
Anand Subramoney
30
0
0
08 Oct 2024
HAP: SPMD DNN Training on Heterogeneous GPU Clusters with Automated Program Synthesis
Shiwei Zhang
Lansong Diao
Chuan Wu
Zongyan Cao
Siyu Wang
Wei Lin
43
12
0
11 Jan 2024
Automated Tensor Model Parallelism with Overlapped Communication for Efficient Foundation Model Training
Shengwei Li
Zhiquan Lai
Yanqi Hao
Weijie Liu
Ke-shi Ge
Xiaoge Deng
Dongsheng Li
KaiCheng Lu
16
10
0
25 May 2023
Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training
W. Tan
Xiao Shi
Cunchi Lv
Xiaofang Zhao
FedML
28
1
0
09 Mar 2023
Expediting Distributed DNN Training with Device Topology-Aware Graph Deployment
Shiwei Zhang
Xiaodong Yi
Lansong Diao
Chuan Wu
Siyu Wang
W. Lin
GNN
22
5
0
13 Feb 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
FreshGNN: Reducing Memory Access via Stable Historical Embeddings for Graph Neural Network Training
Kezhao Huang
Haitian Jiang
Minjie Wang
Guangxuan Xiao
David Wipf
Xiang Song
Quan Gan
Zengfeng Huang
Jidong Zhai
Zheng-Wei Zhang
GNN
33
2
0
18 Jan 2023
Does compressing activations help model parallel training?
S. Bian
Dacheng Li
Hongyi Wang
Eric P. Xing
Shivaram Venkataraman
19
5
0
06 Jan 2023
AMP: Automatically Finding Model Parallel Strategies with Heterogeneity Awareness
Dacheng Li
Hongyi Wang
Eric P. Xing
Haotong Zhang
MoE
22
20
0
13 Oct 2022
Scalable K-FAC Training for Deep Neural Networks with Distributed Preconditioning
Lin Zhang
S. Shi
Wei Wang
Bo-wen Li
36
10
0
30 Jun 2022
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters
Yang Xiang
Zhihua Wu
Weibao Gong
Siyu Ding
Xianjie Mo
...
Yue Yu
Ge Li
Yu Sun
Yanjun Ma
Dianhai Yu
24
4
0
19 May 2022
Efficient Direct-Connect Topologies for Collective Communications
Liangyu Zhao
Siddharth Pal
Tapan Chugh
Weiyang Wang
Jason Fantl
P. Basu
J. Khoury
Arvind Krishnamurthy
27
6
0
07 Feb 2022
HeterPS: Distributed Deep Learning With Reinforcement Learning Based Scheduling in Heterogeneous Environments
Ji Liu
Zhihua Wu
Dianhai Yu
Yanjun Ma
Danlei Feng
Minxu Zhang
Xinxuan Wu
Xuefeng Yao
Dejing Dou
16
44
0
20 Nov 2021
AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning
Siddharth Singh
A. Bhatele
GNN
31
14
0
25 Oct 2021
Multi-model Machine Learning Inference Serving with GPU Spatial Partitioning
S. Choi
Sunho Lee
Yeonjae Kim
Jongse Park
Youngjin Kwon
Jaehyuk Huh
30
21
0
01 Sep 2021
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
S. Shi
Lin Zhang
Bo-wen Li
40
9
0
14 Jul 2021
BAGUA: Scaling up Distributed Learning with System Relaxations
Shaoduo Gan
Xiangru Lian
Rui Wang
Jianbin Chang
Chengjun Liu
...
Jiawei Jiang
Binhang Yuan
Sen Yang
Ji Liu
Ce Zhang
23
30
0
03 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
54
94
0
01 Jul 2021
DynaComm: Accelerating Distributed CNN Training between Edges and Clouds through Dynamic Communication Scheduling
Shangming Cai
Dongsheng Wang
Haixia Wang
Yongqiang Lyu
Guangquan Xu
Xi Zheng
A. Vasilakos
26
6
0
20 Jan 2021
Video Big Data Analytics in the Cloud: A Reference Architecture, Survey, Opportunities, and Open Research Issues
A. Alam
I. Ullah
Young-Koo Lee
42
22
0
16 Nov 2020
Synthesizing Optimal Collective Algorithms
Zixian Cai
Zhengyang Liu
Saeed Maleki
Madan Musuvathi
Todd Mytkowicz
Jacob Nelson
Olli Saarikivi
GNN
23
59
0
19 Aug 2020
DS-Sync: Addressing Network Bottlenecks with Divide-and-Shuffle Synchronization for Distributed DNN Training
Weiyan Wang
Cengguang Zhang
Liu Yang
Kai Chen
Kun Tan
29
12
0
07 Jul 2020
DAPPLE: A Pipelined Data Parallel Approach for Training Large Models
Shiqing Fan
Yi Rong
Chen Meng
Zongyan Cao
Siyu Wang
...
Jun Yang
Lixue Xia
Lansong Diao
Xiaoyong Liu
Wei Lin
21
232
0
02 Jul 2020
Characterizing and Modeling Distributed Training with Transient Cloud GPU Servers
Shijian Li
R. Walls
Tian Guo
28
23
0
07 Apr 2020
Distributed Hierarchical GPU Parameter Server for Massive Scale Deep Learning Ads Systems
Weijie Zhao
Deping Xie
Ronglai Jia
Yulei Qian
Rui Ding
Mingming Sun
P. Li
MoE
59
150
0
12 Mar 2020
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
30
12
0
06 Mar 2020
Adaptive Gradient Sparsification for Efficient Federated Learning: An Online Learning Approach
Pengchao Han
Shiqiang Wang
K. Leung
FedML
32
175
0
14 Jan 2020
Pipelined Training with Stale Weights of Deep Convolutional Neural Networks
Lifu Zhang
T. Abdelrahman
19
0
0
29 Dec 2019
MG-WFBP: Merging Gradients Wisely for Efficient Communication in Distributed Deep Learning
S. Shi
X. Chu
Bo Li
FedML
20
25
0
18 Dec 2019
Layer-wise Adaptive Gradient Sparsification for Distributed Deep Learning with Convergence Guarantees
S. Shi
Zhenheng Tang
Qiang-qiang Wang
Kaiyong Zhao
X. Chu
19
22
0
20 Nov 2019
On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
Aritra Dutta
El Houcine Bergou
A. Abdelmoniem
Chen-Yu Ho
Atal Narayan Sahu
Marco Canini
Panos Kalnis
30
76
0
19 Nov 2019
Characterizing Deep Learning Training Workloads on Alibaba-PAI
Mengdi Wang
Chen Meng
Guoping Long
Chuan Wu
Jun Yang
Wei Lin
Yangqing Jia
17
53
0
14 Oct 2019
Priority-based Parameter Propagation for Distributed DNN Training
Anand Jayarajan
Jinliang Wei
Garth A. Gibson
Alexandra Fedorova
Gennady Pekhimenko
AI4CE
22
178
0
10 May 2019
A Distributed Synchronous SGD Algorithm with Global Top-
k
k
k
Sparsification for Low Bandwidth Networks
S. Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
37
134
0
14 Jan 2019
Fault Tolerance in Iterative-Convergent Machine Learning
Aurick Qiao
Bryon Aragam
Bingjing Zhang
Eric P. Xing
26
41
0
17 Oct 2018
Collaborative Deep Learning Across Multiple Data Centers
Kele Xu
Haibo Mi
Dawei Feng
Huaimin Wang
Chuan Chen
Zibin Zheng
Xu Lan
FedML
67
17
0
16 Oct 2018
Parallax: Sparsity-aware Data Parallel Training of Deep Neural Networks
Soojeong Kim
Gyeong-In Yu
Hojin Park
Sungwoo Cho
Eunji Jeong
Hyeonmin Ha
Sanha Lee
Joo Seong Jeong
Byung-Gon Chun
20
73
0
08 Aug 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis
Tal Ben-Nun
Torsten Hoefler
GNN
33
702
0
26 Feb 2018
3LC: Lightweight and Effective Traffic Compression for Distributed Machine Learning
Hyeontaek Lim
D. Andersen
M. Kaminsky
13
70
0
21 Feb 2018
Exploring Hidden Dimensions in Parallelizing Convolutional Neural Networks
Zhihao Jia
Sina Lin
C. Qi
A. Aiken
37
117
0
14 Feb 2018
Performance Modeling and Evaluation of Distributed Deep Learning Frameworks on GPUs
S. Shi
Qiang-qiang Wang
Xiaowen Chu
37
110
0
16 Nov 2017
1