ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.08497
  4. Cited By
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep
  Learning in a Supercomputing Environment

Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment

18 September 2022
Daegun Yoon
Sangyoon Oh
ArXiv (abs)PDFHTML

Papers citing "Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment"

7 / 7 papers shown
Title
ScaleCom: Scalable Sparsified Gradient Compression for
  Communication-Efficient Distributed Training
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
62
66
0
21 Apr 2021
Adaptive Gradient Quantization for Data-Parallel SGD
Adaptive Gradient Quantization for Data-Parallel SGD
Fartash Faghri
Iman Tabrizian
I. Markov
Dan Alistarh
Daniel M. Roy
Ali Ramezani-Kebrya
MQ
56
83
0
23 Oct 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
331
1,904
0
17 Sep 2019
A Distributed Synchronous SGD Algorithm with Global Top-$k$
  Sparsification for Low Bandwidth Networks
A Distributed Synchronous SGD Algorithm with Global Top-kkk Sparsification for Low Bandwidth Networks
Shaoshuai Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
66
135
0
14 Jan 2019
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
  Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
49
174
0
07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
136
1,407
0
05 Dec 2017
TensorFlow: A system for large-scale machine learning
TensorFlow: A system for large-scale machine learning
Martín Abadi
P. Barham
Jianmin Chen
Zhiwen Chen
Andy Davis
...
Vijay Vasudevan
Pete Warden
Martin Wicke
Yuan Yu
Xiaoqiang Zhang
GNNAI4CE
433
18,361
0
27 May 2016
1