ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.10458
  4. Cited By
Towards Scalable Distributed Training of Deep Learning on Public Cloud
  Clusters

Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters

20 October 2020
S. Shi
Xianhao Zhou
Shutao Song
Xingyao Wang
Zilin Zhu
Xuelin Huang
Xinan Jiang
Feihu Zhou
Zhenyu Guo
Liqiang Xie
Rui Lan
Xianbin Ouyang
Yan Zhang
Jieqian Wei
Jing Gong
Weiliang Lin
Ping Gao
Peng Meng
Xiaomin Xu
Chenyang Guo
Bo Yang
Zhibo Chen
Yongjian Wu
X. Chu
    GNN
ArXivPDFHTML

Papers citing "Towards Scalable Distributed Training of Deep Learning on Public Cloud Clusters"

7 / 7 papers shown
Title
Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed
  ML Training
Cloudless-Training: A Framework to Improve Efficiency of Geo-Distributed ML Training
W. Tan
Xiao Shi
Cunchi Lv
Xiaofang Zhao
FedML
28
1
0
09 Mar 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
24
25
0
24 Jan 2023
Nebula-I: A General Framework for Collaboratively Training Deep Learning
  Models on Low-Bandwidth Cloud Clusters
Nebula-I: A General Framework for Collaboratively Training Deep Learning Models on Low-Bandwidth Cloud Clusters
Yang Xiang
Zhihua Wu
Weibao Gong
Siyu Ding
Xianjie Mo
...
Yue Yu
Ge Li
Yu Sun
Yanjun Ma
Dianhai Yu
24
4
0
19 May 2022
Accelerating Distributed K-FAC with Smart Parallelism of Computing and
  Communication Tasks
Accelerating Distributed K-FAC with Smart Parallelism of Computing and Communication Tasks
S. Shi
Lin Zhang
Bo-wen Li
40
9
0
14 Jul 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
31
46
0
28 Feb 2021
1-bit Adam: Communication Efficient Large-Scale Training with Adam's
  Convergence Speed
1-bit Adam: Communication Efficient Large-Scale Training with Adam's Convergence Speed
Hanlin Tang
Shaoduo Gan
A. A. Awan
Samyam Rajbhandari
Conglong Li
Xiangru Lian
Ji Liu
Ce Zhang
Yuxiong He
AI4CE
45
84
0
04 Feb 2021
An Efficient Statistical-based Gradient Compression Technique for
  Distributed Training Systems
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
A. Abdelmoniem
Ahmed Elzanaty
Mohamed-Slim Alouini
Marco Canini
51
75
0
26 Jan 2021
1