Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification

29 October 2020

Saurabh Agarwal

Hongyi Wang

Kangwook Lee

Shivaram Venkataraman

Dimitris Papailiopoulos

ArXiv PDF HTML

Papers citing "Accordion: Adaptive Gradient Communication via Critical Learning Regime Identification"

43 / 43 papers shown

Title
The Early Phase of Neural Network Training Jonathan Frankle D. Schwab Ari S. Morcos 54 173 0 24 Feb 2020
The Break-Even Point on Optimization Trajectories of Deep Neural Networks Stanislaw Jastrzebski Maciej Szymczak Stanislav Fort Devansh Arpit Jacek Tabor Kyunghyun Cho Krzysztof J. Geras 66 157 0 21 Feb 2020
Overlap Local-SGD: An Algorithmic Approach to Hide Communication Delays in Distributed SGD Jianyu Wang Hao Liang Gauri Joshi 32 33 0 21 Feb 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library Adam Paszke Sam Gross Francisco Massa Adam Lerer James Bradbury ... Sasank Chilamkurthy Benoit Steiner Lu Fang Junjie Bai Soumith Chintala ODL 277 42,038 0 03 Dec 2019
Understanding Top-k Sparsification in Distributed Deep Learning Shaoshuai Shi Xiaowen Chu Ka Chun Cheung Simon See 167 96 0 20 Nov 2019
MLPerf Training Benchmark Arya D. McCarthy Christine Cheng Cody Coleman Greg Diamos Paulius Micikevicius ... Carole-Jean Wu Lingjie Xu Masafumi Yamazaki C. Young Matei A. Zaharia 74 307 0 02 Oct 2019
Deep Learning Recommendation Model for Personalization and Recommendation Systems Maxim Naumov Dheevatsa Mudigere Hao-Jun Michael Shi Jianyu Huang Narayanan Sundaraman ... Wenlin Chen Vijay Rao Bill Jia Liang Xiong M. Smelyanskiy 62 726 0 31 May 2019
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization Thijs Vogels Sai Praneeth Karimireddy Martin Jaggi 56 320 0 31 May 2019
Distributed Learning with Sublinear Communication Jayadev Acharya Christopher De Sa Dylan J. Foster Karthik Sridharan FedML 45 40 0 28 Feb 2019
A Distributed Synchronous SGD Algorithm with Global Top- $k$ Sparsification for Low Bandwidth Networks Shaoshuai Shi Qiang-qiang Wang Kaiyong Zhao Zhenheng Tang Yuxin Wang Xiang Huang Xiaowen Chu 55 135 0 14 Jan 2019
Gradient Descent Happens in a Tiny Subspace Guy Gur-Ari Daniel A. Roberts Ethan Dyer 59 232 0 12 Dec 2018
Deep Learning on Graphs: A Survey Ziwei Zhang Peng Cui Wenwu Zhu GNN 147 1,324 0 11 Dec 2018
On the Computational Inefficiency of Large Batch Sizes for Stochastic Gradient Descent Noah Golmant N. Vemuri Z. Yao Vladimir Feinberg A. Gholami Kai Rothauge Michael W. Mahoney Joseph E. Gonzalez 61 73 0 30 Nov 2018
GradiVeQ: Vector Quantization for Bandwidth-Efficient Gradient Aggregation in Distributed CNN Training Timo C. Wunderlich Zhifeng Lin S. A. Aamir Andreas Grübl Youjie Li David Stöckel Alex Schwing M. Annavaram A. Avestimehr MQ 31 64 0 08 Nov 2018
Measuring the Effects of Data Parallelism on Neural Network Training Christopher J. Shallue Jaehoon Lee J. Antognini J. Mamou J. Ketterling Yao Wang 73 408 0 08 Nov 2018
Adaptive Communication Strategies to Achieve the Best Error-Runtime Trade-off in Local-Update SGD Jianyu Wang Gauri Joshi FedML 65 232 0 19 Oct 2018
Large batch size training of neural networks with adversarial training and second-order information Z. Yao A. Gholami Daiyaan Arfeen Richard Liaw Joseph E. Gonzalez Kurt Keutzer Michael W. Mahoney ODL 52 42 0 02 Oct 2018
Don't Use Large Mini-Batches, Use Local SGD Tao R. Lin Sebastian U. Stich Kumar Kshitij Patel Martin Jaggi 104 432 0 22 Aug 2018
On the Relation Between the Sharpest Directions of DNN Loss and the SGD Step Length Stanislaw Jastrzebski Zachary Kenton Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey ODL 46 115 0 13 Jul 2018
ATOMO: Communication-efficient Learning via Atomic Sparsification Hongyi Wang Scott Sievert Zachary B. Charles Shengchao Liu S. Wright Dimitris Papailiopoulos 55 351 0 11 Jun 2018
Local SGD Converges Fast and Communicates Little Sebastian U. Stich FedML 152 1,056 0 24 May 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD Sanghamitra Dutta Gauri Joshi Soumyadip Ghosh Parijat Dube P. Nagpurkar 53 194 0 03 Mar 2018
Hessian-based Analysis of Large Batch Training and Robustness to Adversaries Z. Yao A. Gholami Qi Lei Kurt Keutzer Michael W. Mahoney 56 166 0 22 Feb 2018
Horovod: fast and easy distributed deep learning in TensorFlow Alexander Sergeev Mike Del Balso 59 1,218 0 15 Feb 2018
signSGD: Compressed Optimisation for Non-Convex Problems Jeremy Bernstein Yu Wang Kamyar Azizzadenesheli Anima Anandkumar FedML ODL 78 1,026 0 13 Feb 2018
AdaComp : Adaptive Residual Gradient Compression for Data-Parallel Distributed Training Chia-Yu Chen Jungwook Choi D. Brand A. Agrawal Wei Zhang K. Gopalakrishnan ODL 40 173 0 07 Dec 2017
AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks Aditya Devarakonda Maxim Naumov M. Garland ODL 57 136 0 06 Dec 2017
Deep Gradient Compression: Reducing the Communication Bandwidth for Distributed Training Chengyue Wu Song Han Huizi Mao Yu Wang W. Dally 107 1,399 0 05 Dec 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 93 990 0 01 Nov 2017
Gradient Sparsification for Communication-Efficient Distributed Optimization Jianqiao Wangni Jialei Wang Ji Liu Tong Zhang 68 521 0 26 Oct 2017
Squeeze-and-Excitation Networks Jie Hu Li Shen Samuel Albanie Gang Sun Enhua Wu 341 26,241 0 05 Sep 2017
Gradient Diversity: a Key Ingredient for Scalable Distributed Learning Dong Yin A. Pananjady Max Lam Dimitris Papailiopoulos Kannan Ramchandran Peter L. Bartlett 33 11 0 18 Jun 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 453 129,831 0 12 Jun 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 94 3,666 0 08 Jun 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks Elad Hoffer Itay Hubara Daniel Soudry ODL 142 799 0 24 May 2017
TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning W. Wen Cong Xu Feng Yan Chunpeng Wu Yandan Wang Yiran Chen Hai Helen Li 128 985 0 22 May 2017
Sparse Communication for Distributed Gradient Descent Alham Fikri Aji Kenneth Heafield 58 738 0 17 Apr 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 362 2,922 0 15 Sep 2016
Densely Connected Convolutional Networks Gao Huang Zhuang Liu Laurens van der Maaten Kilian Q. Weinberger PINN 3DV 631 36,599 0 25 Aug 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.4K 192,638 0 10 Dec 2015
FireCaffe: near-linear acceleration of deep neural network training on compute clusters F. Iandola Khalid Ashraf Matthew W. Moskewicz Kurt Keutzer 52 302 0 31 Oct 2015
Going Deeper with Convolutions Christian Szegedy Wei Liu Yangqing Jia P. Sermanet Scott E. Reed Dragomir Anguelov D. Erhan Vincent Vanhoucke Andrew Rabinovich 333 43,511 0 17 Sep 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 954 99,991 0 04 Sep 2014