ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.12460
  4. Cited By
Adaptive Gradient Quantization for Data-Parallel SGD

Adaptive Gradient Quantization for Data-Parallel SGD

23 October 2020
Fartash Faghri
Iman Tabrizian
I. Markov
Dan Alistarh
Daniel M. Roy
Ali Ramezani-Kebrya
    MQ
ArXivPDFHTML

Papers citing "Adaptive Gradient Quantization for Data-Parallel SGD"

17 / 17 papers shown
Title
Differentiable Weightless Neural Networks
Differentiable Weightless Neural Networks
Alan T. L. Bacellar
Zachary Susskind
Mauricio Breternitz Jr.
E. John
L. John
P. Lima
F. M. G. França
34
3
0
14 Oct 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
39
6
0
09 Apr 2024
Communication Compression for Byzantine Robust Learning: New Efficient
  Algorithms and Improved Rates
Communication Compression for Byzantine Robust Learning: New Efficient Algorithms and Improved Rates
Ahmad Rammal
Kaja Gruntkowska
Nikita Fedin
Eduard A. Gorbunov
Peter Richtárik
50
5
0
15 Oct 2023
Distributed Extra-gradient with Optimal Complexity and Communication
  Guarantees
Distributed Extra-gradient with Optimal Complexity and Communication Guarantees
Ali Ramezani-Kebrya
Kimon Antonakopoulos
Igor Krawczuk
Justin Deschenaux
V. Cevher
41
3
0
17 Aug 2023
Self-Distilled Quantization: Achieving High Compression Rates in
  Transformer-Based Language Models
Self-Distilled Quantization: Achieving High Compression Rates in Transformer-Based Language Models
James OÑeill
Sourav Dutta
VLM
MQ
47
1
0
12 Jul 2023
FedREP: A Byzantine-Robust, Communication-Efficient and
  Privacy-Preserving Framework for Federated Learning
FedREP: A Byzantine-Robust, Communication-Efficient and Privacy-Preserving Framework for Federated Learning
Yi-Rui Yang
Kun Wang
Wulu Li
FedML
52
3
0
09 Mar 2023
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware
  Communication Compression
Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication Compression
Jaeyong Song
Jinkyu Yim
Jaewon Jung
Hongsun Jang
H. Kim
Youngsok Kim
Jinho Lee
GNN
34
25
0
24 Jan 2023
Adaptive Compression for Communication-Efficient Distributed Training
Adaptive Compression for Communication-Efficient Distributed Training
Maksim Makarenko
Elnur Gasanov
Rustem Islamov
Abdurakhmon Sadiev
Peter Richtárik
55
14
0
31 Oct 2022
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and
  Accurate Deep Learning
L-GreCo: Layerwise-Adaptive Gradient Compression for Efficient and Accurate Deep Learning
Mohammadreza Alimohammadi
I. Markov
Elias Frantar
Dan Alistarh
40
5
0
31 Oct 2022
lo-fi: distributed fine-tuning without communication
lo-fi: distributed fine-tuning without communication
Mitchell Wortsman
Suchin Gururangan
Shen Li
Ali Farhadi
Ludwig Schmidt
Michael G. Rabbat
Ari S. Morcos
39
24
0
19 Oct 2022
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep
  Learning in a Supercomputing Environment
Empirical Analysis on Top-k Gradient Sparsification for Distributed Deep Learning in a Supercomputing Environment
Daegun Yoon
Sangyoon Oh
28
0
0
18 Sep 2022
Fine-tuning Language Models over Slow Networks using Activation
  Compression with Guarantees
Fine-tuning Language Models over Slow Networks using Activation Compression with Guarantees
Jue Wang
Binhang Yuan
Luka Rimanic
Yongjun He
Tri Dao
Beidi Chen
Christopher Ré
Ce Zhang
AI4CE
31
11
0
02 Jun 2022
Communication-Efficient Distributed Learning via Sparse and Adaptive
  Stochastic Gradient
Communication-Efficient Distributed Learning via Sparse and Adaptive Stochastic Gradient
Xiaoge Deng
Dongsheng Li
Tao Sun
Xicheng Lu
FedML
31
0
0
08 Dec 2021
What Do We Mean by Generalization in Federated Learning?
What Do We Mean by Generalization in Federated Learning?
Honglin Yuan
Warren Morningstar
Lin Ning
K. Singhal
OOD
FedML
46
71
0
27 Oct 2021
Fundamental limits of over-the-air optimization: Are analog schemes
  optimal?
Fundamental limits of over-the-air optimization: Are analog schemes optimal?
Shubham K. Jha
Prathamesh Mayekar
Himanshu Tyagi
29
7
0
11 Sep 2021
NUQSGD: Provably Communication-efficient Data-parallel SGD via Nonuniform Quantization
Ali Ramezani-Kebrya
Fartash Faghri
Ilya Markov
V. Aksenov
Dan Alistarh
Daniel M. Roy
MQ
65
31
0
28 Apr 2021
Moshpit SGD: Communication-Efficient Decentralized Training on
  Heterogeneous Unreliable Devices
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
42
33
0
04 Mar 2021
1