Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

16 October 2018

Papers citing "Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron"

28 / 28 papers shown

Title
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance Dimitris Oikonomou Nicolas Loizou 64 5 0 06 Jun 2024
Demystifying SGD with Doubly Stochastic Gradients Kyurae Kim Joohwan Ko Yian Ma Jacob R. Gardner 79 1 0 03 Jun 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation Aaron Mishkin Mert Pilanci Mark Schmidt 93 1 0 03 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance Qi Zhang Yi Zhou Shaofeng Zou 84 5 0 01 Apr 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization Sungbin Shin Dongyeop Lee Maksym Andriushchenko Namhoon Lee AAML 80 1 0 29 Nov 2023
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning Sai Praneeth Karimireddy Martin Jaggi Satyen Kale M. Mohri Sashank J. Reddi Sebastian U. Stich A. Suresh FedML 69 217 0 08 Aug 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation Robert Mansel Gower Othmane Sebbouh Nicolas Loizou 71 75 0 18 Jun 2020
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias Lu Yu Krishnakumar Balasubramanian S. Volgushev Murat A. Erdogdu 75 50 0 14 Jun 2020
The Implicit Regularization of Stochastic Gradient Flow for Least Squares Alnur Ali Yan Sun Robert Tibshirani 59 77 0 17 Mar 2020
On exponential convergence of SGD in non-convex over-parametrized learning Xinhai Liu M. Belkin Yu-Shen Liu 59 101 0 06 Nov 2018
Accelerating SGD with momentum for over-parameterized learning Chaoyue Liu M. Belkin ODL 20 19 0 31 Oct 2018
An Alternative View: When Does SGD Escape Local Minima? Robert D. Kleinberg Yuanzhi Li Yang Yuan MLT 57 316 0 17 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning Siyuan Ma Raef Bassily M. Belkin 48 289 0 18 Dec 2017
Natasha 2: Faster Non-Convex Optimization Than SGD Zeyuan Allen-Zhu ODL 64 245 0 29 Aug 2017
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks Mahdi Soltanolkotabi Adel Javanmard Jason D. Lee 118 417 0 16 Jul 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 94 3,666 0 08 Jun 2017
Convergence Analysis of Two-layer Neural Networks with ReLU Activation Yuanzhi Li Yang Yuan MLT 106 650 0 28 May 2017
Understanding deep learning requires rethinking generalization Chiyuan Zhang Samy Bengio Moritz Hardt Benjamin Recht Oriol Vinyals HAI 276 4,620 0 10 Nov 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark Schmidt 223 1,208 0 16 Aug 2016
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods Zeyuan Allen-Zhu ODL 85 580 0 18 Mar 2016
Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization Roy Frostig Rong Ge Sham Kakade Aaron Sidford 50 150 0 24 Jun 2015
Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems Yuxin Chen Emmanuel J. Candes 58 589 0 19 May 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 940 149,474 0 22 Dec 2014
Guaranteed Matrix Completion via Non-convex Factorization Ruoyu Sun Zhi-Quan Luo 65 452 0 28 Nov 2014
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives Aaron Defazio Francis R. Bach Simon Lacoste-Julien ODL 110 1,817 0 01 Jul 2014
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming Saeed Ghadimi Guanghui Lan ODL 90 1,538 0 22 Sep 2013
Minimizing Finite Sums with the Stochastic Average Gradient Mark Schmidt Nicolas Le Roux Francis R. Bach 255 1,246 0 10 Sep 2013
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler ODL 115 6,619 0 22 Dec 2012