ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.16248
  4. Cited By
Accordion: Adaptive Gradient Communication via Critical Learning Regime
  Identification

Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

29 October 2020
Saurabh Agarwal
Hongyi Wang
Kangwook Lee
Shivaram Venkataraman
Dimitris Papailiopoulos
ArXivPDFHTML

Papers citing "Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification"

43 / 43 papers shown
Title
The Early Phase of Neural Network Training
The Early Phase of Neural Network Training
Jonathan Frankle
D. Schwab
Ari S. Morcos
54
173
0
24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural
  Networks
The Break-Even Point on Optimization Trajectories of Deep Neural Networks
Stanislaw Jastrzebski
Maciej Szymczak
Stanislav Fort
Devansh Arpit
Jacek Tabor
Kyunghyun Cho
Krzysztof J. Geras
66
157
0
21 Feb 2020
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays
  in Distributed SGD
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD
Jianyu Wang
Hao Liang
Gauri Joshi
32
33
0
21 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
277
42,038
0
03 Dec 2019
Understanding Top-k Sparsification in Distributed Deep Learning
Understanding Top-k Sparsification in Distributed Deep Learning
Shaoshuai Shi
Xiaowen Chu
Ka Chun Cheung
Simon See
167
96
0
20 Nov 2019
MLPerf Training Benchmark
MLPerf Training Benchmark
Arya D. McCarthy
Christine Cheng
Cody Coleman
Greg Diamos
Paulius Micikevicius
...
Carole-Jean Wu
Lingjie Xu
Masafumi Yamazaki
C. Young
Matei A. Zaharia
74
307
0
02 Oct 2019
Deep Learning Recommendation Model for Personalization and
  Recommendation Systems
Deep Learning Recommendation Model for Personalization and Recommendation Systems
Maxim Naumov
Dheevatsa Mudigere
Hao-Jun Michael Shi
Jianyu Huang
Narayanan Sundaraman
...
Wenlin Chen
Vijay Rao
Bill Jia
Liang Xiong
M. Smelyanskiy
62
726
0
31 May 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed
  Optimization
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization
Thijs Vogels
Sai Praneeth Karimireddy
Martin Jaggi
56
320
0
31 May 2019
Distributed Learning with Sublinear Communication
Distributed Learning with Sublinear Communication
Jayadev Acharya
Christopher De Sa
Dylan J. Foster
Karthik Sridharan
FedML
45
40
0
28 Feb 2019
A Distributed Synchronous SGD Algorithm with Global Top-$k$
  Sparsification for Low Bandwidth Networks
A Distributed Synchronous SGD Algorithm with Global Top-kkk Sparsification for Low Bandwidth Networks
Shaoshuai Shi
Qiang-qiang Wang
Kaiyong Zhao
Zhenheng Tang
Yuxin Wang
Xiang Huang
Xiaowen Chu
55
135
0
14 Jan 2019
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
59
232
0
12 Dec 2018
Deep Learning on Graphs: A Survey
Deep Learning on Graphs: A Survey
Ziwei Zhang
Peng Cui
Wenwu Zhu
GNN
147
1,324
0
11 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic
  Gradient Descent
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent
Noah Golmant
N. Vemuri
Z. Yao
Vladimir Feinberg
A. Gholami
Kai Rothauge
Michael W. Mahoney
Joseph E. Gonzalez
61
73
0
30 Nov 2018
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient
  Aggregation in Distributed CNN Training
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training
Timo C. Wunderlich
Zhifeng Lin
S. A. Aamir
Andreas Grübl
Youjie Li
David Stöckel
Alex Schwing
M. Annavaram
A. Avestimehr
MQ
31
64
0
08 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training
Measuring the Effects of Data Parallelism on Neural Network Training
Christopher J. Shallue
Jaehoon Lee
J. Antognini
J. Mamou
J. Ketterling
Yao Wang
73
408
0
08 Nov 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime
  Trade-off in Local-Update SGD
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD
Jianyu Wang
Gauri Joshi
FedML
65
232
0
19 Oct 2018
Large batch size training of neural networks with adversarial training
  and second-order information
Large batch size training of neural networks with adversarial training and second-order information
Z. Yao
A. Gholami
Daiyaan Arfeen
Richard Liaw
Joseph E. Gonzalez
Kurt Keutzer
Michael W. Mahoney
ODL
52
42
0
02 Oct 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
104
432
0
22 Aug 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD
  Step Length
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length
Stanislaw Jastrzebski
Zachary Kenton
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
ODL
46
115
0
13 Jul 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification
ATOMO: Communication-efficient Learning via Atomic Sparsification
Hongyi Wang
Scott Sievert
Zachary B. Charles
Shengchao Liu
S. Wright
Dimitris Papailiopoulos
55
351
0
11 Jun 2018
Local SGD Converges Fast and Communicates Little
Local SGD Converges Fast and Communicates Little
Sebastian U. Stich
FedML
152
1,056
0
24 May 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in
  Distributed SGD
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD
Sanghamitra Dutta
Gauri Joshi
Soumyadip Ghosh
Parijat Dube
P. Nagpurkar
53
194
0
03 Mar 2018
Hessian-based Analysis of Large Batch Training and Robustness to
  Adversaries
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries
Z. Yao
A. Gholami
Qi Lei
Kurt Keutzer
Michael W. Mahoney
56
166
0
22 Feb 2018
Horovod: fast and easy distributed deep learning in TensorFlow
Horovod: fast and easy distributed deep learning in TensorFlow
Alexander Sergeev
Mike Del Balso
59
1,218
0
15 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems
signSGD: Compressed Optimisation for Non-Convex Problems
Jeremy Bernstein
Yu Wang
Kamyar Azizzadenesheli
Anima Anandkumar
FedML
ODL
78
1,026
0
13 Feb 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel
  Distributed Training
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training
Chia-Yu Chen
Jungwook Choi
D. Brand
A. Agrawal
Wei Zhang
K. Gopalakrishnan
ODL
40
173
0
07 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks
Aditya Devarakonda
Maxim Naumov
M. Garland
ODL
57
136
0
06 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for
  Distributed Training
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training
Chengyue Wu
Song Han
Huizi Mao
Yu Wang
W. Dally
107
1,399
0
05 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
93
990
0
01 Nov 2017
Gradient Sparsification for Communication-Efficient Distributed
  Optimization
Gradient Sparsification for Communication-Efficient Distributed Optimization
Jianqiao Wangni
Jialei Wang
Ji Liu
Tong Zhang
68
521
0
26 Oct 2017
Squeeze-and-Excitation Networks
Squeeze-and-Excitation Networks
Jie Hu
Li Shen
Samuel Albanie
Gang Sun
Enhua Wu
341
26,241
0
05 Sep 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning
Dong Yin
A. Pananjady
Max Lam
Dimitris Papailiopoulos
Kannan Ramchandran
Peter L. Bartlett
33
11
0
18 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
94
3,666
0
08 Jun 2017
Train longer, generalize better: closing the generalization gap in large
  batch training of neural networks
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
142
799
0
24 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep
  Learning
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning
W. Wen
Cong Xu
Feng Yan
Chunpeng Wu
Yandan Wang
Yiran Chen
Hai Helen Li
128
985
0
22 May 2017
Sparse Communication for Distributed Gradient Descent
Sparse Communication for Distributed Gradient Descent
Alham Fikri Aji
Kenneth Heafield
58
738
0
17 Apr 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
362
2,922
0
15 Sep 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
631
36,599
0
25 Aug 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
FireCaffe: near-linear acceleration of deep neural network training on
  compute clusters
FireCaffe: near-linear acceleration of deep neural network training on compute clusters
F. Iandola
Khalid Ashraf
Matthew W. Moskewicz
Kurt Keutzer
52
302
0
31 Oct 2015
Going Deeper with Convolutions
Going Deeper with Convolutions
Christian Szegedy
Wei Liu
Yangqing Jia
P. Sermanet
Scott E. Reed
Dragomir Anguelov
D. Erhan
Vincent Vanhoucke
Andrew Rabinovich
333
43,511
0
17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
954
99,991
0
04 Sep 2014
1