ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2108.00951
  4. Cited By
Rethinking gradient sparsification as total error minimization

Rethinking gradient sparsification as total error minimization

2 August 2021
Atal Narayan Sahu
Aritra Dutta
A. Abdelmoniem
Trambak Banerjee
Marco Canini
Panos Kalnis
ArXiv (abs)PDFHTML

Papers citing "Rethinking gradient sparsification as total error minimization"

39 / 39 papers shown
Title
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
99
0
0
28 Oct 2024
DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient
  Distributed Learning
DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning
Guangfeng Yan
Shao-Lun Huang
Tian-Shing Lan
Linqi Song
MQ
44
6
0
30 Jul 2021
Compressed Communication for Distributed Training: Adaptive Methods and
  System
Compressed Communication for Distributed Training: Adaptive Methods and System
Yuchen Zhong
Cong Xie
Shuai Zheng
Yanghua Peng
72
9
0
17 May 2021
An Efficient Statistical-based Gradient Compression Technique for
  Distributed Training Systems
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
A. Abdelmoniem
Ahmed Elzanaty
Mohamed-Slim Alouini
Marco Canini
116
77
0
26 Jan 2021
Accordion: Adaptive Gradient Communication via Critical Learning Regime
  Identification
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
80
25
0
29 Oct 2020
Linearly Converging Error Compensated SGD
Linearly Converging Error Compensated SGD
Eduard A. Gorbunov
D. Kovalev
Dmitry Makarenko
Peter Richtárik
209
78
0
23 Oct 2020
Optimal Gradient Compression for Distributed and Federated Learning
Optimal Gradient Compression for Distributed and Federated Learning
Alyazeed Albasyoni
M. Safaryan
Laurent Condat
Peter Richtárik
FedML
56
64
0
07 Oct 2020
CSER: Communication-efficient SGD with Error Reset
CSER: Communication-efficient SGD with Error Reset
Cong Xie
Shuai Zheng
Oluwasanmi Koyejo
Indranil Gupta
Mu Li
Yanghua Peng
90
40
0
26 Jul 2020
Breaking the Communication-Privacy-Accuracy Trilemma
Breaking the Communication-Privacy-Accuracy Trilemma
Wei-Ning Chen
Peter Kairouz
Ayfer Özgür
133
120
0
22 Jul 2020
A Better Alternative to Error Feedback for Communication-Efficient
  Distributed Learning
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning
Samuel Horváth
Peter Richtárik
79
60
0
19 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,520
0
28 May 2020
On Biased Compression for Distributed Learning
On Biased Compression for Distributed Learning
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
70
189
0
27 Feb 2020
Uncertainty Principle for Communication Compression in Distributed and
  Federated Learning and the Search for an Optimal Compressor
Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
M. Safaryan
Egor Shulgin
Peter Richtárik
80
61
0
20 Feb 2020
Understanding Top-k Sparsification in Distributed Deep Learning
Understanding Top-k Sparsification in Distributed Deep Learning
Shaoshuai Shi
Xiaowen Chu
Ka Chun Cheung
Simon See
228
101
0
20 Nov 2019
On the Discrepancy between the Theoretical Analysis and Practical
  Implementations of Compressed Communication for Distributed Deep Learning
On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
Aritra Dutta
El Houcine Bergou
A. Abdelmoniem
Chen-Yu Ho
Atal Narayan Sahu
Marco Canini
Panos Kalnis
74
77
0
19 Nov 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
343
1,920
0
17 Sep 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed
  Gradients and Compressed Communication
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
Sebastian U. Stich
Sai Praneeth Karimireddy
FedML
72
20
0
11 Sep 2019
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification,
  and Local Computations
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Debraj Basu
Deepesh Data
C. Karakuş
Suhas Diggavi
MQ
69
406
0
06 Jun 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed
  Optimization
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
99
322
0
31 May 2019
Communication-Efficient Distributed Blockwise Momentum SGD with
  Error-Feedback
Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Shuai Zheng
Ziyue Huang
James T. Kwok
54
115
0
27 May 2019
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Sai Praneeth Karimireddy
Quentin Rebjock
Sebastian U. Stich
Martin Jaggi
85
503
0
28 Jan 2019
The Convergence of Sparsified Gradient Methods
The Convergence of Sparsified Gradient Methods
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
177
493
0
27 Sep 2018
Sparsified SGD with Memory
Sparsified SGD with Memory
Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
87
753
0
20 Sep 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification
ATOMO: Communication-efficient Learning via Atomic Sparsification
Hongyi Wang
Scott Sievert
Zachary B. Charles
Shengchao Liu
S. Wright
Dimitris Papailiopoulos
85
354
0
11 Jun 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
191
1,070
0
24 May 2018
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
155
1,410
0
05 Dec 2017
Gradient Sparsification for Communication-Efficient Distributed
  Optimization
Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang
100
529
0
26 Oct 2017
Squeeze-and-Excitation Networks
Squeeze-and-Excitation Networks
Jie Hu
Li Shen
Samuel Albanie
Gang Sun
Enhua Wu
427
26,605
0
05 Sep 2017
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case
  Study for Decentralized Parallel Stochastic Gradient Descent
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Ce Zhang
Huan Zhang
Cho-Jui Hsieh
Wei Zhang
Ji Liu
68
1,235
0
25 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
194
990
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
89
742
0
17 Apr 2017
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Dan Alistarh
Demjan Grubic
Jerry Li
Ryota Tomioka
Milan Vojnović
MQ
84
421
0
07 Oct 2016
Pointer Sentinel Mixture Models
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
349
2,900
0
26 Sep 2016
Communication-Efficient Learning of Deep Networks from Decentralized
  Data
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
412
17,615
0
17 Feb 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,641
0
10 Dec 2015
8-Bit Approximations for Parallelism in Deep Learning
8-Bit Approximations for Parallelism in Deep Learning
Tim Dettmers
81
176
0
14 Nov 2015
Deep Learning with Limited Numerical Precision
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
209
2,049
0
09 Feb 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,433
0
22 Dec 2014
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
496
43,717
0
17 Sep 2014
1