Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1809.07599
Cited By
v1
v2 (latest)
Sparsified SGD with Memory
20 September 2018
Sebastian U. Stich
Jean-Baptiste Cordonnier
Martin Jaggi
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sparsified SGD with Memory"
37 / 37 papers shown
Title
FedFetch: Faster Federated Learning with Adaptive Downstream Prefetching
Qifan Yan
Andrew Liu
Shiqi He
Mathias Lécuyer
Ivan Beschastnikh
FedML
152
0
0
21 Apr 2025
QLESS: A Quantized Approach for Data Valuation and Selection in Large Language Model Fine-Tuning
Moses Ananta
Muhammad Farid Adilazuarda
Zayd Muhammad Kawakibi Zuhri
Ayu Purwarianti
Alham Fikri Aji
MQ
126
0
0
03 Feb 2025
Sketched Adaptive Federated Deep Learning: A Sharp Convergence Analysis
Zhijie Chen
Qiaobo Li
A. Banerjee
FedML
90
0
0
11 Nov 2024
Trustworthiness of Stochastic Gradient Descent in Distributed Learning
Hongyang Li
Caesar Wu
Mohammed Chadli
Said Mammar
Pascal Bouvry
82
1
0
28 Oct 2024
Collaborative and Efficient Personalization with Mixtures of Adaptors
Abdulla Jasem Almansoori
Samuel Horváth
Martin Takáč
FedML
79
3
0
04 Oct 2024
Communication-efficient Vertical Federated Learning via Compressed Error Feedback
Pedro Valdeira
João Xavier
Cláudia Soares
Yuejie Chi
FedML
89
4
0
20 Jun 2024
Achieving Dimension-Free Communication in Federated Learning via Zeroth-Order Optimization
Zhe Li
Bicheng Ying
Zidong Liu
Chaosheng Dong
Haibo Yang
FedML
118
3
0
24 May 2024
SADDLe: Sharpness-Aware Decentralized Deep Learning with Heterogeneous Data
Sakshi Choudhary
Sai Aparna Aketi
Kaushik Roy
FedML
87
0
0
22 May 2024
Revolutionizing Wireless Networks with Federated Learning: A Comprehensive Review
Sajjad Emdadi Mahdimahalleh
AI4CE
78
1
0
01 Aug 2023
Lower Bounds and Accelerated Algorithms in Distributed Stochastic Optimization with Communication Compression
Yutong He
Xinmeng Huang
Yiming Chen
W. Yin
Kun Yuan
72
7
0
12 May 2023
Communication optimization strategies for distributed deep neural network training: A survey
Shuo Ouyang
Dezun Dong
Yemao Xu
Liquan Xiao
82
12
0
06 Mar 2020
The Convergence of Sparsified Gradient Methods
Dan Alistarh
Torsten Hoefler
M. Johansson
Sarit Khirirat
Nikola Konstantinov
Cédric Renggli
167
494
0
27 Sep 2018
Error Compensated Quantized SGD and its Applications to Large-scale Distributed Optimization
Jiaxiang Wu
Weidong Huang
Junzhou Huang
Tong Zhang
81
236
0
21 Jun 2018
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
183
1,067
0
24 May 2018
The Convergence of Stochastic Gradient Descent in Asynchronous Shared Memory
Dan Alistarh
Christopher De Sa
Nikola Konstantinov
41
42
0
23 Mar 2018
Communication Compression for Decentralized Training
Hanlin Tang
Shaoduo Gan
Ce Zhang
Tong Zhang
Ji Liu
65
273
0
17 Mar 2018
Improved asynchronous parallel optimization analysis for stochastic incremental methods
Rémi Leblond
Fabian Pedregosa
Simon Lacoste-Julien
65
70
0
11 Jan 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
49
174
0
07 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
138
1,407
0
05 Dec 2017
Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang
90
528
0
26 Oct 2017
meProp: Sparsified Back Propagation for Accelerated Deep Learning with Reduced Overfitting
Xu Sun
Xuancheng Ren
Shuming Ma
Houfeng Wang
65
157
0
19 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
128
3,685
0
08 Jun 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
142
990
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
66
741
0
17 Apr 2017
QSGD: Communication-Efficient SGD via Gradient Quantization and Encoding
Dan Alistarh
Demjan Grubic
Jerry Li
Ryota Tomioka
Milan Vojnović
MQ
66
423
0
07 Oct 2016
ASAGA: Asynchronous Parallel SAGA
Rémi Leblond
Fabian Pedregosa
Simon Lacoste-Julien
AI4TS
65
21
0
15 Jun 2016
Perturbed Iterate Analysis for Asynchronous Stochastic Optimization
Horia Mania
Xinghao Pan
Dimitris Papailiopoulos
Benjamin Recht
Kannan Ramchandran
Michael I. Jordan
94
233
0
24 Jul 2015
Taming the Wild: A Unified Analysis of Hogwild!-Style Algorithms
Christopher De Sa
Ce Zhang
K. Olukotun
Christopher Ré
80
204
0
22 Jun 2015
PASSCoDe: Parallel ASynchronous Stochastic dual Co-ordinate Descent
Cho-Jui Hsieh
Hsiang-Fu Yu
Inderjit S. Dhillon
70
108
0
06 Apr 2015
Deep Learning with Limited Numerical Precision
Suyog Gupta
A. Agrawal
K. Gopalakrishnan
P. Narayanan
HAI
207
2,049
0
09 Feb 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
2.0K
150,260
0
22 Dec 2014
Stochastic Optimization with Importance Sampling
P. Zhao
Tong Zhang
96
345
0
13 Jan 2014
Minimizing Finite Sums with the Stochastic Average Gradient
Mark Schmidt
Nicolas Le Roux
Francis R. Bach
324
1,249
0
10 Sep 2013
A simpler approach to obtaining an O(1/t) convergence rate for the projected stochastic subgradient method
Simon Lacoste-Julien
Mark Schmidt
Francis R. Bach
185
260
0
10 Dec 2012
Stochastic Gradient Descent for Non-smooth Optimization: Convergence Results and Optimal Averaging Schemes
Ohad Shamir
Tong Zhang
153
576
0
08 Dec 2012
Making Gradient Descent Optimal for Strongly Convex Stochastic Optimization
Alexander Rakhlin
Ohad Shamir
Karthik Sridharan
169
768
0
26 Sep 2011
HOGWILD!: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent
Feng Niu
Benjamin Recht
Christopher Ré
Stephen J. Wright
201
2,273
0
28 Jun 2011
1