Adaptive Gradient Quantization for Data-Parallel SGD

Adaptive Gradient Quantization for Data-Parallel SGD

23 October 2020

Dan Alistarh

Ali Ramezani-Kebrya

Papers citing "Adaptive Gradient Quantization for Data-Parallel SGD"

17 / 17 papers shown

Title
Differentiable Weightless Neural Networks Alan T. L. Bacellar Zachary Susskind Mauricio Breternitz Jr. E. John L. John P. Lima F. M. G. França 34 3 0 14 Oct 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey Feng Liang Zhen Zhang Haifeng Lu Victor C. M. Leung Yanyi Guo Xiping Hu GNN 39 6 0 09 Apr 2024
Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates Ahmad Rammal Kaja Gruntkowska Nikita Fedin Eduard A. Gorbunov Peter Richtárik 50 5 0 15 Oct 2023
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees Ali Ramezani-Kebrya Kimon Antonakopoulos Igor Krawczuk Justin Deschenaux V. Cevher 41 3 0 17 Aug 2023
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models James OÑeill Sourav Dutta VLM MQ 47 1 0 12 Jul 2023
FedREP: A Byzantine-Robust, Communication-Efficient and Privacy-Preserving Framework for Federated Learning Yi-Rui Yang Kun Wang Wulu Li FedML 52 3 0 09 Mar 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression Jaeyong Song Jinkyu Yim Jaewon Jung Hongsun Jang H. Kim Youngsok Kim Jinho Lee GNN 34 25 0 24 Jan 2023
Adaptive Compression for Communication-Efficient Distributed Training Maksim Makarenko Elnur Gasanov Rustem Islamov Abdurakhmon Sadiev Peter Richtárik 55 14 0 31 Oct 2022
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning Mohammadreza Alimohammadi I. Markov Elias Frantar Dan Alistarh 40 5 0 31 Oct 2022
lo-fi: distributed fine-tuning without communication Mitchell Wortsman Suchin Gururangan Shen Li Ali Farhadi Ludwig Schmidt Michael G. Rabbat Ari S. Morcos 39 24 0 19 Oct 2022
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment Daegun Yoon Sangyoon Oh 28 0 0 18 Sep 2022
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees Jue Wang Binhang Yuan Luka Rimanic Yongjun He Tri Dao Beidi Chen Christopher Ré Ce Zhang AI4CE 31 11 0 02 Jun 2022
Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient Xiaoge Deng Dongsheng Li Tao Sun Xicheng Lu FedML 28 0 0 08 Dec 2021
What Do We Mean by Generalization in Federated Learning? Honglin Yuan Warren Morningstar Lin Ning K. Singhal OOD FedML 46 71 0 27 Oct 2021
Fundamental limits of over-the-air optimization: Are analog schemes optimal? Shubham K. Jha Prathamesh Mayekar Himanshu Tyagi 29 7 0 11 Sep 2021
NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization Ali Ramezani-Kebrya Fartash Faghri Ilya Markov V. Aksenov Dan Alistarh Daniel M. Roy MQ 65 31 0 28 Apr 2021
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices Max Ryabinin Eduard A. Gorbunov Vsevolod Plokhotnyuk Gennady Pekhimenko 42 33 0 04 Mar 2021