ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.07288
  4. Cited By
Fast and Faster Convergence of SGD for Over-Parameterized Models and an
  Accelerated Perceptron

Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron

16 October 2018
Sharan Vaswani
Francis R. Bach
Mark Schmidt
ArXivPDFHTML

Papers citing "Fast and Faster Convergence of SGD for Over-Parameterized Models and an Accelerated Perceptron"

28 / 28 papers shown
Title
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou
Nicolas Loizou
64
5
0
06 Jun 2024
Demystifying SGD with Doubly Stochastic Gradients
Demystifying SGD with Doubly Stochastic Gradients
Kyurae Kim
Joohwan Ko
Yian Ma
Jacob R. Gardner
79
1
0
03 Jun 2024
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Aaron Mishkin
Mert Pilanci
Mark Schmidt
93
1
0
03 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
84
5
0
01 Apr 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
80
1
0
29 Nov 2023
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Mime: Mimicking Centralized Stochastic Algorithms in Federated Learning
Sai Praneeth Karimireddy
Martin Jaggi
Satyen Kale
M. Mohri
Sashank J. Reddi
Sebastian U. Stich
A. Suresh
FedML
69
217
0
08 Aug 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
  Interpolation
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
71
75
0
18 Jun 2020
An Analysis of Constant Step Size SGD in the Non-convex Regime:
  Asymptotic Normality and Bias
An Analysis of Constant Step Size SGD in the Non-convex Regime: Asymptotic Normality and Bias
Lu Yu
Krishnakumar Balasubramanian
S. Volgushev
Murat A. Erdogdu
75
50
0
14 Jun 2020
The Implicit Regularization of Stochastic Gradient Flow for Least
  Squares
The Implicit Regularization of Stochastic Gradient Flow for Least Squares
Alnur Ali
Yan Sun
Robert Tibshirani
59
77
0
17 Mar 2020
On exponential convergence of SGD in non-convex over-parametrized
  learning
On exponential convergence of SGD in non-convex over-parametrized learning
Xinhai Liu
M. Belkin
Yu-Shen Liu
59
101
0
06 Nov 2018
Accelerating SGD with momentum for over-parameterized learning
Accelerating SGD with momentum for over-parameterized learning
Chaoyue Liu
M. Belkin
ODL
20
19
0
31 Oct 2018
An Alternative View: When Does SGD Escape Local Minima?
An Alternative View: When Does SGD Escape Local Minima?
Robert D. Kleinberg
Yuanzhi Li
Yang Yuan
MLT
57
316
0
17 Feb 2018
The Power of Interpolation: Understanding the Effectiveness of SGD in
  Modern Over-parametrized Learning
The Power of Interpolation: Understanding the Effectiveness of SGD in Modern Over-parametrized Learning
Siyuan Ma
Raef Bassily
M. Belkin
48
289
0
18 Dec 2017
Natasha 2: Faster Non-Convex Optimization Than SGD
Natasha 2: Faster Non-Convex Optimization Than SGD
Zeyuan Allen-Zhu
ODL
64
245
0
29 Aug 2017
Theoretical insights into the optimization landscape of
  over-parameterized shallow neural networks
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
Mahdi Soltanolkotabi
Adel Javanmard
Jason D. Lee
118
417
0
16 Jul 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour
Priya Goyal
Piotr Dollár
Ross B. Girshick
P. Noordhuis
Lukasz Wesolowski
Aapo Kyrola
Andrew Tulloch
Yangqing Jia
Kaiming He
3DH
94
3,666
0
08 Jun 2017
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Convergence Analysis of Two-layer Neural Networks with ReLU Activation
Yuanzhi Li
Yang Yuan
MLT
106
650
0
28 May 2017
Understanding deep learning requires rethinking generalization
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
276
4,620
0
10 Nov 2016
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
223
1,208
0
16 Aug 2016
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Katyusha: The First Direct Acceleration of Stochastic Gradient Methods
Zeyuan Allen-Zhu
ODL
85
580
0
18 Mar 2016
Un-regularizing: approximate proximal point and faster stochastic
  algorithms for empirical risk minimization
Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization
Roy Frostig
Rong Ge
Sham Kakade
Aaron Sidford
50
150
0
24 Jun 2015
Solving Random Quadratic Systems of Equations Is Nearly as Easy as
  Solving Linear Systems
Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems
Yuxin Chen
Emmanuel J. Candes
58
589
0
19 May 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
940
149,474
0
22 Dec 2014
Guaranteed Matrix Completion via Non-convex Factorization
Guaranteed Matrix Completion via Non-convex Factorization
Ruoyu Sun
Zhi-Quan Luo
65
452
0
28 Nov 2014
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly
  Convex Composite Objectives
SAGA: A Fast Incremental Gradient Method With Support for Non-Strongly Convex Composite Objectives
Aaron Defazio
Francis R. Bach
Simon Lacoste-Julien
ODL
110
1,817
0
01 Jul 2014
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic
  Programming
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
Saeed Ghadimi
Guanghui Lan
ODL
90
1,538
0
22 Sep 2013
Minimizing Finite Sums with the Stochastic Average Gradient
Minimizing Finite Sums with the Stochastic Average Gradient
Mark Schmidt
Nicolas Le Roux
Francis R. Bach
255
1,246
0
10 Sep 2013
ADADELTA: An Adaptive Learning Rate Method
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
115
6,619
0
22 Dec 2012
1