Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.08497
Cited By
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
18 September 2022
Daegun Yoon
Sangyoon Oh
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment"
7 / 7 papers shown
Title
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
62
66
0
21 Apr 2021
Adaptive Gradient Quantization for Data-Parallel SGD
Fartash Faghri
Iman Tabrizian
I. Markov
Dan Alistarh
Daniel M. Roy
Ali Ramezani-Kebrya
MQ
56
83
0
23 Oct 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,904
0
17 Sep 2019
A Distributed Synchronous SGD Algorithm with Global Top-
k
k
k
Sparsification for Low Bandwidth Networks
Shaoshuai Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
66
135
0
14 Jan 2019
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
49
174
0
07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
136
1,407
0
05 Dec 2017
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNN
AI4CE
433
18,361
0
27 May 2016
1