Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2108.00951
Cited By
Rethinking gradient sparsification as total error minimization
2 August 2021
Atal Narayan Sahu
Aritra Dutta
A. Abdelmoniem
Trambak Banerjee
Marco Canini
Panos Kalnis
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Rethinking gradient sparsification as total error minimization"
39 / 39 papers shown
Title
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation
Shih-yang Liu
Huck Yang
Nai Chit Fung
Charbel Sakr
Hongxu Yin
...
Jan Kautz
Yu-Chun Wang
Pavlo Molchanov
Min-Hung Chen
Min-Hung Chen
MQ
99
0
0
28 Oct 2024
DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning
Guangfeng Yan
Shao-Lun Huang
Tian-Shing Lan
Linqi Song
MQ
44
6
0
30 Jul 2021
Compressed Communication for Distributed Training: Adaptive Methods and System
Yuchen Zhong
Cong Xie
Shuai Zheng
Yanghua Peng
72
9
0
17 May 2021
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems
A. Abdelmoniem
Ahmed Elzanaty
Mohamed-Slim Alouini
Marco Canini
116
77
0
26 Jan 2021
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
80
25
0
29 Oct 2020
Linearly Converging Error Compensated SGD
Eduard A. Gorbunov
D. Kovalev
Dmitry Makarenko
Peter Richtárik
209
78
0
23 Oct 2020
Optimal Gradient Compression for Distributed and Federated Learning
Alyazeed Albasyoni
M. Safaryan
Laurent Condat
Peter Richtárik
FedML
56
64
0
07 Oct 2020
CSER: Communication-efficient SGD with Error Reset
Cong Xie
Shuai Zheng
Oluwasanmi Koyejo
Indranil Gupta
Mu Li
Yanghua Peng
90
40
0
26 Jul 2020
Breaking the Communication-Privacy-Accuracy Trilemma
Wei-Ning Chen
Peter Kairouz
Ayfer Özgür
133
120
0
22 Jul 2020
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning
Samuel Horváth
Peter Richtárik
79
60
0
19 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,520
0
28 May 2020
On Biased Compression for Distributed Learning
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
70
189
0
27 Feb 2020
Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor
M. Safaryan
Egor Shulgin
Peter Richtárik
80
61
0
20 Feb 2020
Understanding Top-k Sparsification in Distributed Deep Learning
Shaoshuai Shi
Xiaowen Chu
Ka Chun Cheung
Simon See
228
101
0
20 Nov 2019
On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning
Aritra Dutta
El Houcine Bergou
A. Abdelmoniem
Chen-Yu Ho
Atal Narayan Sahu
Marco Canini
Panos Kalnis
74
77
0
19 Nov 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
343
1,920
0
17 Sep 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication
Sebastian U. Stich
Sai Praneeth Karimireddy
FedML
72
20
0
11 Sep 2019
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations
Debraj Basu
Deepesh Data
C. Karakuş
Suhas Diggavi
MQ
69
406
0
06 Jun 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
99
322
0
31 May 2019
Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback
Shuai Zheng
Ziyue Huang
James T. Kwok
54
115
0
27 May 2019
Error Feedback Fixes SignSGD and other Gradient Compression Schemes
Sai Praneeth Karimireddy
Quentin Rebjock
Sebastian U. Stich
Martin Jaggi
85
503
0
28 Jan 2019
The Convergence of Sparsified Gradient Methods
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
177
493
0
27 Sep 2018
Sparsified SGD with Memory
Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
87
753
0
20 Sep 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification
Hongyi Wang
Scott Sievert
Zachary B. Charles
Shengchao Liu
S. Wright
Dimitris Papailiopoulos
85
354
0
11 Jun 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
191
1,070
0
24 May 2018
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
155
1,410
0
05 Dec 2017
Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang
100
529
0
26 Oct 2017
Squeeze-and-Excitation Networks
Jie Hu
Li Shen
Samuel Albanie
Gang Sun
Enhua Wu
427
26,605
0
05 Sep 2017
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent
Xiangru Lian
Ce Zhang
Huan Zhang
Cho-Jui Hsieh
Wei Zhang
Ji Liu
68
1,235
0
25 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
194
990
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
89
742
0
17 Apr 2017
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Dan Alistarh
Demjan Grubic
Jerry Li
Ryota Tomioka
Milan Vojnović
MQ
84
421
0
07 Oct 2016
Pointer Sentinel Mixture Models
Stephen Merity
Caiming Xiong
James Bradbury
R. Socher
RALM
349
2,900
0
26 Sep 2016
Communication-Efficient Learning of Deep Networks from Decentralized Data
H. B. McMahan
Eider Moore
Daniel Ramage
S. Hampson
Blaise Agüera y Arcas
FedML
412
17,615
0
17 Feb 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.3K
194,641
0
10 Dec 2015
8-Bit Approximations for Parallelism in Deep Learning
Tim Dettmers
81
176
0
14 Nov 2015
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
209
2,049
0
09 Feb 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.1K
150,433
0
22 Dec 2014
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
496
43,717
0
17 Sep 2014
1