YellowFin and the Art of Momentum Tuning

12 June 2017

Papers citing "YellowFin and the Art of Momentum Tuning"

17 / 17 papers shown

Title
Robust Compressed Sensing using Generative Models A. Jalal Liu Liu A. Dimakis C. Caramanis 21 39 0 16 Jun 2020
Pipelined Backpropagation at Scale: Training Large Models without Batches Atli Kosson Vitaliy Chiley Abhinav Venigalla Joel Hestness Urs Koster 35 33 0 25 Mar 2020
Demon: Improved Neural Network Training with Momentum Decay John Chen Cameron R. Wolfe Zhaoqi Li Anastasios Kyrillidis ODL 24 15 0 11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization Jerry Ma Denis Yarats 51 70 0 09 Oct 2019
Optimizing Multi-GPU Parallelization Strategies for Deep Learning Training Saptadeep Pal Eiman Ebrahimi A. Zulfiqar Yaosheng Fu Victor Zhang Szymon Migacz D. Nellans Puneet Gupta 34 55 0 30 Jul 2019
Reducing the variance in online optimization by transporting past gradients Sébastien M. R. Arnold Pierre-Antoine Manzagol Reza Babanezhad Ioannis Mitliagkas Nicolas Le Roux 21 28 0 08 Jun 2019
Segmentation of Roots in Soil with U-Net Abraham George Smith Jens Petersen Raghavendra Selvan C. Rasmussen 11 122 0 28 Feb 2019
Quasi-hyperbolic momentum and Adam for deep learning Jerry Ma Denis Yarats ODL 84 129 0 16 Oct 2018
Cooperative SGD: A unified Framework for the Design and Analysis of Communication-Efficient SGD Algorithms Jianyu Wang Gauri Joshi 13 348 0 22 Aug 2018
On the insufficiency of existing momentum schemes for Stochastic Optimization Rahul Kidambi Praneeth Netrapalli Prateek Jain Sham Kakade ODL 17 117 0 15 Mar 2018
Slow and Stale Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed SGD Sanghamitra Dutta Gauri Joshi Soumyadip Ghosh Parijat Dube P. Nagpurkar 12 193 0 03 Mar 2018
Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis Tal Ben-Nun Torsten Hoefler GNN 30 701 0 26 Feb 2018
SparCML: High-Performance Sparse Communication for Machine Learning Cédric Renggli Saleh Ashkboos Mehdi Aghagolzadeh Dan Alistarh Torsten Hoefler 21 126 0 22 Feb 2018
Intriguing Properties of Randomly Weighted Networks: Generalizing While Learning Next to Nothing Amir Rosenfeld John K. Tsotsos MLT 24 51 0 02 Feb 2018
Momentum and Stochastic Momentum for Stochastic Gradient, Newton, Proximal Point and Subspace Descent Methods Nicolas Loizou Peter Richtárik 17 199 0 27 Dec 2017
The Robust Manifold Defense: Adversarial Training using Generative Models A. Jalal Andrew Ilyas C. Daskalakis A. Dimakis AAML 31 174 0 26 Dec 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark W. Schmidt 139 1,199 0 16 Aug 2016