Rethinking gradient sparsification as total error minimization

2 August 2021

Papers citing "Rethinking gradient sparsification as total error minimization"

39 / 39 papers shown

Title
EoRA: Fine-tuning-free Compensation for Compressed LLM with Eigenspace Low-Rank Approximation Shih-yang Liu Huck Yang Nai Chit Fung Charbel Sakr Hongxu Yin ... Jan Kautz Yu-Chun Wang Pavlo Molchanov Min-Hung Chen Min-Hung Chen MQ 99 0 0 28 Oct 2024
DQ-SGD: Dynamic Quantization in SGD for Communication-Efficient Distributed Learning Guangfeng Yan Shao-Lun Huang Tian-Shing Lan Linqi Song MQ 44 6 0 30 Jul 2021
Compressed Communication for Distributed Training: Adaptive Methods and System Yuchen Zhong Cong Xie Shuai Zheng Yanghua Peng 72 9 0 17 May 2021
An Efficient Statistical-based Gradient Compression Technique for Distributed Training Systems A. Abdelmoniem Ahmed Elzanaty Mohamed-Slim Alouini Marco Canini 116 77 0 26 Jan 2021
Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification Saurabh Agarwal Hongyi Wang Kangwook Lee Shivaram Venkataraman Dimitris Papailiopoulos 80 25 0 29 Oct 2020
Linearly Converging Error Compensated SGD Eduard A. Gorbunov D. Kovalev Dmitry Makarenko Peter Richtárik 209 78 0 23 Oct 2020
Optimal Gradient Compression for Distributed and Federated Learning Alyazeed Albasyoni M. Safaryan Laurent Condat Peter Richtárik FedML 56 64 0 07 Oct 2020
CSER: Communication-efficient SGD with Error Reset Cong Xie Shuai Zheng Oluwasanmi Koyejo Indranil Gupta Mu Li Yanghua Peng 90 40 0 26 Jul 2020
Breaking the Communication-Privacy-Accuracy Trilemma Wei-Ning Chen Peter Kairouz Ayfer Özgür 133 120 0 22 Jul 2020
A Better Alternative to Error Feedback for Communication-Efficient Distributed Learning Samuel Horváth Peter Richtárik 79 60 0 19 Jun 2020
Language Models are Few-Shot Learners Tom B. Brown Benjamin Mann Nick Ryder Melanie Subbiah Jared Kaplan ... Christopher Berner Sam McCandlish Alec Radford Ilya Sutskever Dario Amodei BDL 904 42,520 0 28 May 2020
On Biased Compression for Distributed Learning Aleksandr Beznosikov Samuel Horváth Peter Richtárik M. Safaryan 70 189 0 27 Feb 2020
Uncertainty Principle for Communication Compression in Distributed and Federated Learning and the Search for an Optimal Compressor M. Safaryan Egor Shulgin Peter Richtárik 80 61 0 20 Feb 2020
Understanding Top-k Sparsification in Distributed Deep Learning Shaoshuai Shi Xiaowen Chu Ka Chun Cheung Simon See 228 101 0 20 Nov 2019
On the Discrepancy between the Theoretical Analysis and Practical Implementations of Compressed Communication for Distributed Deep Learning Aritra Dutta El Houcine Bergou A. Abdelmoniem Chen-Yu Ho Atal Narayan Sahu Marco Canini Panos Kalnis 74 77 0 19 Nov 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism Mohammad Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 343 1,920 0 17 Sep 2019
The Error-Feedback Framework: Better Rates for SGD with Delayed Gradients and Compressed Communication Sebastian U. Stich Sai Praneeth Karimireddy FedML 72 20 0 11 Sep 2019
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification, and Local Computations Debraj Basu Deepesh Data C. Karakuş Suhas Diggavi MQ 69 406 0 06 Jun 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization Thijs Vogels Sai Praneeth Karimireddy Martin Jaggi 99 322 0 31 May 2019
Communication-Efficient Distributed Blockwise Momentum SGD with Error-Feedback Shuai Zheng Ziyue Huang James T. Kwok 54 115 0 27 May 2019
Error Feedback Fixes SignSGD and other Gradient Compression Schemes Sai Praneeth Karimireddy Quentin Rebjock Sebastian U. Stich Martin Jaggi 85 503 0 28 Jan 2019
The Convergence of Sparsified Gradient Methods Dan Alistarh Torsten Hoefler M. Johansson Sarit Khirirat Nikola Konstantinov Cédric Renggli 177 493 0 27 Sep 2018
Sparsified SGD with Memory Sebastian U. Stich Jean-Baptiste Cordonnier Martin Jaggi 87 753 0 20 Sep 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification Hongyi Wang Scott Sievert Zachary B. Charles Shengchao Liu S. Wright Dimitris Papailiopoulos 85 354 0 11 Jun 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 191 1,070 0 24 May 2018
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Chengyue Wu Song Han Huizi Mao Yu Wang W. Dally 155 1,410 0 05 Dec 2017
Gradient Sparsification for Communication-Efficient Distributed Optimization Jianqiao Wangni Jialei Wang Ji Liu Tong Zhang 100 529 0 26 Oct 2017
Squeeze-and-Excitation Networks Jie Hu Li Shen Samuel Albanie Gang Sun Enhua Wu 427 26,605 0 05 Sep 2017
Can Decentralized Algorithms Outperform Centralized Algorithms? A Case Study for Decentralized Parallel Stochastic Gradient Descent Xiangru Lian Ce Zhang Huan Zhang Cho-Jui Hsieh Wei Zhang Ji Liu 68 1,235 0 25 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning W. Wen Cong Xu Feng Yan Chunpeng Wu Yandan Wang Yiran Chen Hai Helen Li 194 990 0 22 May 2017
Sparse Communication for Distributed Gradient Descent Alham Fikri Aji Kenneth Heafield 89 742 0 17 Apr 2017
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding Dan Alistarh Demjan Grubic Jerry Li Ryota Tomioka Milan Vojnović MQ 84 421 0 07 Oct 2016
Pointer Sentinel Mixture Models Stephen Merity Caiming Xiong James Bradbury R. Socher RALM 349 2,900 0 26 Sep 2016
Communication-Efficient Learning of Deep Networks from Decentralized Data H. B. McMahan Eider Moore Daniel Ramage S. Hampson Blaise Agüera y Arcas FedML 412 17,615 0 17 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.3K 194,641 0 10 Dec 2015
8-Bit Approximations for Parallelism in Deep Learning Tim Dettmers 81 176 0 14 Nov 2015
Deep Learning with Limited Numerical Precision Suyog Gupta A. Agrawal K. Gopalakrishnan P. Narayanan HAI 209 2,049 0 09 Feb 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.1K 150,433 0 22 Dec 2014
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 496 43,717 0 17 Sep 2014