ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1809.07599
  4. Cited By
Sparsified SGD with Memory
v1v2 (latest)

Sparsified SGD with Memory

20 September 2018
Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
ArXiv (abs)PDFHTML

Papers citing "Sparsified SGD with Memory"

37 / 37 papers shown
Title
FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching
FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching
Qifan Yan
Andrew Liu
Shiqi He
Mathias Lécuyer
Ivan Beschastnikh
FedML
152
0
0
21 Apr 2025
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
Moses Ananta
Muhammad Farid Adilazuarda
Zayd Muhammad Kawakibi Zuhri
Ayu Purwarianti
Alham Fikri Aji
MQ
126
0
0
03 Feb 2025
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
90
0
0
11 Nov 2024
Trustworthiness of Stochastic Gradient Descent in Distributed Learning
Trustworthiness of Stochastic Gradient Descent in Distributed Learning
Hongyang Li
Caesar Wu
Mohammed Chadli
Said Mammar
Pascal Bouvry
82
1
0
28 Oct 2024
Collaborative and Efficient Personalization with Mixtures of Adaptors
Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori
Samuel Horváth
Martin Takáč
FedML
79
3
0
04 Oct 2024
Communication-efficient Vertical Federated Learning via Compressed Error Feedback
Communication-efficient Vertical Federated Learning via Compressed Error Feedback
Pedro Valdeira
João Xavier
Cláudia Soares
Yuejie Chi
FedML
89
4
0
20 Jun 2024
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Zhe Li
Bicheng Ying
Zidong Liu
Chaosheng Dong
Haibo Yang
FedML
118
3
0
24 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
87
0
0
22 May 2024
Revolutionizing Wireless Networks with Federated Learning: A Comprehensive Review
Revolutionizing Wireless Networks with Federated Learning: A Comprehensive Review
Sajjad Emdadi Mahdimahalleh
AI4CE
78
1
0
01 Aug 2023
Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression
Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression
Yutong He
Xinmeng Huang
Yiming Chen
W. Yin
Kun Yuan
72
7
0
12 May 2023
Communication optimization strategies for distributed deep neural
  network training: A survey
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
82
12
0
06 Mar 2020
The Convergence of Sparsified Gradient Methods
The Convergence of Sparsified Gradient Methods
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
167
494
0
27 Sep 2018
Error Compensated Quantized SGD and its Applications to Large-scale
  Distributed Optimization
Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Jiaxiang Wu
Weidong Huang
Junzhou Huang
Tong Zhang
81
236
0
21 Jun 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
183
1,067
0
24 May 2018
The Convergence of Stochastic Gradient Descent in Asynchronous Shared
  Memory
The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
Dan Alistarh
Christopher De Sa
Nikola Konstantinov
41
42
0
23 Mar 2018
Communication Compression for Decentralized Training
Communication Compression for Decentralized Training
Hanlin Tang
Shaoduo Gan
Ce Zhang
Tong Zhang
Ji Liu
65
273
0
17 Mar 2018
Improved asynchronous parallel optimization analysis for stochastic
  incremental methods
Improved asynchronous parallel optimization analysis for stochastic incremental methods
Rémi Leblond
Fabian Pedregosa
Simon Lacoste-Julien
65
70
0
11 Jan 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
  Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
49
174
0
07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
138
1,407
0
05 Dec 2017
Gradient Sparsification for Communication-Efficient Distributed
  Optimization
Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang
90
528
0
26 Oct 2017
meProp: Sparsified Back Propagation for Accelerated Deep Learning with
  Reduced Overfitting
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
Xu Sun
Xuancheng Ren
Shuming Ma
Houfeng Wang
65
157
0
19 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
142
990
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
66
741
0
17 Apr 2017
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Dan Alistarh
Demjan Grubic
Jerry Li
Ryota Tomioka
Milan Vojnović
MQ
66
423
0
07 Oct 2016
ASAGA: Asynchronous Parallel SAGA
ASAGA: Asynchronous Parallel SAGA
Rémi Leblond
Fabian Pedregosa
Simon Lacoste-Julien
AI4TS
65
21
0
15 Jun 2016
Perturbed Iterate Analysis for Asynchronous Stochastic Optimization
Perturbed Iterate Analysis for Asynchronous Stochastic Optimization
Horia Mania
Xinghao Pan
Dimitris Papailiopoulos
Benjamin Recht
Kannan Ramchandran
Michael I. Jordan
94
233
0
24 Jul 2015
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms
Christopher De Sa
Ce Zhang
K. Olukotun
Christopher Ré
80
204
0
22 Jun 2015
PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent
PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent
Cho-Jui Hsieh
Hsiang-Fu Yu
Inderjit S. Dhillon
70
108
0
06 Apr 2015
Deep Learning with Limited Numerical Precision
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
207
2,049
0
09 Feb 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,260
0
22 Dec 2014
Stochastic Optimization with Importance Sampling
Stochastic Optimization with Importance Sampling
P. Zhao
Tong Zhang
96
345
0
13 Jan 2014
Minimizing Finite Sums with the Stochastic Average Gradient
Minimizing Finite Sums with the Stochastic Average Gradient
Mark Schmidt
Nicolas Le Roux
Francis R. Bach
324
1,249
0
10 Sep 2013
A simpler approach to obtaining an O(1/t) convergence rate for the
  projected stochastic subgradient method
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark Schmidt
Francis R. Bach
185
260
0
10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence
  Results and Optimal Averaging Schemes
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
153
576
0
08 Dec 2012
Making Gradient Descent Optimal for Strongly Convex Stochastic
  Optimization
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Alexander Rakhlin
Ohad Shamir
Karthik Sridharan
169
768
0
26 Sep 2011
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient
  Descent
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Feng Niu
Benjamin Recht
Christopher Ré
Stephen J. Wright
201
2,273
0
28 Jun 2011
1