Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.07288
Cited By
Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron
16 October 2018
Sharan Vaswani
Francis R. Bach
Mark Schmidt
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron"
28 / 28 papers shown
Title
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou
Nicolas Loizou
64
5
0
06 Jun 2024
Demystifying SGD with Doubly Stochastic Gradients
Kyurae Kim
Joohwan Ko
Yian Ma
Jacob R. Gardner
79
1
0
03 Jun 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
93
1
0
03 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
84
5
0
01 Apr 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
80
1
0
29 Nov 2023
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Sai Praneeth Karimireddy
Martin Jaggi
Satyen Kale
M. Mohri
Sashank J. Reddi
Sebastian U. Stich
A. Suresh
FedML
69
217
0
08 Aug 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
71
75
0
18 Jun 2020
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Lu Yu
Krishnakumar Balasubramanian
S. Volgushev
Murat A. Erdogdu
75
50
0
14 Jun 2020
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
59
77
0
17 Mar 2020
On exponential convergence of SGD in non-convex over-parametrized learning
Xinhai Liu
M. Belkin
Yu-Shen Liu
59
101
0
06 Nov 2018
Accelerating SGD with momentum for over-parameterized learning
Chaoyue Liu
M. Belkin
ODL
20
19
0
31 Oct 2018
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
57
316
0
17 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
48
289
0
18 Dec 2017
Natasha 2: Faster Non-Convex Optimization Than SGD
Zeyuan Allen-Zhu
ODL
64
245
0
29 Aug 2017
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
Mahdi Soltanolkotabi
Adel Javanmard
Jason D. Lee
118
417
0
16 Jul 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
94
3,666
0
08 Jun 2017
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Yuanzhi Li
Yang Yuan
MLT
106
650
0
28 May 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
276
4,620
0
10 Nov 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
223
1,208
0
16 Aug 2016
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Zeyuan Allen-Zhu
ODL
85
580
0
18 Mar 2016
Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization
Roy Frostig
Rong Ge
Sham Kakade
Aaron Sidford
50
150
0
24 Jun 2015
Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems
Yuxin Chen
Emmanuel J. Candes
58
589
0
19 May 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
940
149,474
0
22 Dec 2014
Guaranteed Matrix Completion via Non-convex Factorization
Ruoyu Sun
Zhi-Quan Luo
65
452
0
28 Nov 2014
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
Aaron Defazio
Francis R. Bach
Simon Lacoste-Julien
ODL
110
1,817
0
01 Jul 2014
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
Saeed Ghadimi
Guanghui Lan
ODL
90
1,538
0
22 Sep 2013
Minimizing Finite Sums with the Stochastic Average Gradient
Mark Schmidt
Nicolas Le Roux
Francis R. Bach
255
1,246
0
10 Sep 2013
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
115
6,619
0
22 Dec 2012
1