ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.01811
  4. Cited By
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes

AdaGrad stepsizes: Sharp convergence over nonconvex landscapes

5 June 2018
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
    ODL
ArXivPDFHTML

Papers citing "AdaGrad stepsizes: Sharp convergence over nonconvex landscapes"

50 / 54 papers shown
Title
Observability conditions for neural state-space models with eigenvalues and their roots of unity
Observability conditions for neural state-space models with eigenvalues and their roots of unity
Andrew Gracyk
143
0
0
22 Apr 2025
Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization
Adaptive Extrapolated Proximal Gradient Methods with Variance Reduction for Composite Nonconvex Finite-Sum Minimization
Ganzhao Yuan
43
0
0
28 Feb 2025
Sparklen: A Statistical Learning Toolkit for High-Dimensional Hawkes Processes in Python
Sparklen: A Statistical Learning Toolkit for High-Dimensional Hawkes Processes in Python
Romain Edmond Lacoste
GP
58
0
0
26 Feb 2025
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton
  Stepsizes
An Adaptive Stochastic Gradient Method with Non-negative Gauss-Newton Stepsizes
Antonio Orvieto
Lin Xiao
42
2
0
05 Jul 2024
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou
Nicolas Loizou
55
4
0
06 Jun 2024
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
39
1
0
12 Apr 2024
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Convergence Guarantees for RMSProp and Adam in Generalized-smooth Non-convex Optimization with Affine Noise Variance
Qi Zhang
Yi Zhou
Shaofeng Zou
42
4
0
01 Apr 2024
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Sayantan Choudhury
N. Tupitsa
Nicolas Loizou
Samuel Horváth
Martin Takáč
Eduard A. Gorbunov
30
1
0
05 Mar 2024
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
AdaBatchGrad: Combining Adaptive Batch Size and Adaptive Step Size
P. Ostroukhov
Aigerim Zhumabayeva
Chulu Xiang
Alexander Gasnikov
Martin Takáč
Dmitry Kamzolov
ODL
43
2
0
07 Feb 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
46
11
0
06 Feb 2024
How Free is Parameter-Free Stochastic Optimization?
How Free is Parameter-Free Stochastic Optimization?
Amit Attia
Tomer Koren
ODL
47
4
0
05 Feb 2024
Bridging Classical and Quantum Machine Learning: Knowledge Transfer From Classical to Quantum Neural Networks Using Knowledge Distillation
Bridging Classical and Quantum Machine Learning: Knowledge Transfer From Classical to Quantum Neural Networks Using Knowledge Distillation
Mohammad Junayed Hasan
M.R.C. Mahdy
29
1
0
23 Nov 2023
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Demystifying the Myths and Legends of Nonconvex Convergence of SGD
Aritra Dutta
El Houcine Bergou
Soumia Boucherouite
Nicklas Werge
M. Kandemir
Xin Li
26
0
0
19 Oct 2023
Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and
  Relaxed Assumptions
Convergence of AdaGrad for Non-convex Objectives: Simple Proofs and Relaxed Assumptions
Bo Wang
Huishuai Zhang
Zhirui Ma
Wei Chen
37
49
0
29 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
39
15
0
21 May 2023
Adaptive Federated Learning via New Entropy Approach
Adaptive Federated Learning via New Entropy Approach
Shensheng Zheng
Wenhao Yuan
Xuehe Wang
Ling-Yu Duan
FedML
OOD
28
1
0
27 Mar 2023
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to
  Unknown Parameters, Unbounded Gradients and Affine Variance
SGD with AdaGrad Stepsizes: Full Adaptivity with High Probability to Unknown Parameters, Unbounded Gradients and Affine Variance
Amit Attia
Tomer Koren
ODL
22
25
0
17 Feb 2023
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
DoG is SGD's Best Friend: A Parameter-Free Dynamic Step Size Schedule
Maor Ivgi
Oliver Hinder
Y. Carmon
ODL
26
56
0
08 Feb 2023
Learning-Rate-Free Learning by D-Adaptation
Learning-Rate-Free Learning by D-Adaptation
Aaron Defazio
Konstantin Mishchenko
30
77
0
18 Jan 2023
Differentially Private Adaptive Optimization with Delayed
  Preconditioners
Differentially Private Adaptive Optimization with Delayed Preconditioners
Tian Li
Manzil Zaheer
Ziyu Liu
Sashank J. Reddi
H. B. McMahan
Virginia Smith
47
10
0
01 Dec 2022
Adaptive Stochastic Optimisation of Nonconvex Composite Objectives
Adaptive Stochastic Optimisation of Nonconvex Composite Objectives
Weijia Shao
F. Sivrikaya
S. Albayrak
18
0
0
21 Nov 2022
Extra-Newton: A First Approach to Noise-Adaptive Accelerated
  Second-Order Methods
Extra-Newton: A First Approach to Noise-Adaptive Accelerated Second-Order Methods
Kimon Antonakopoulos
Ali Kavis
V. Cevher
ODL
29
12
0
03 Nov 2022
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax
  Optimization
TiAda: A Time-scale Adaptive Algorithm for Nonconvex Minimax Optimization
Xiang Li
Junchi Yang
Niao He
26
8
0
31 Oct 2022
On the fast convergence of minibatch heavy ball momentum
On the fast convergence of minibatch heavy ball momentum
Raghu Bollapragada
Tyler Chen
Rachel A. Ward
29
17
0
15 Jun 2022
Supervised Dictionary Learning with Auxiliary Covariates
Supervised Dictionary Learning with Auxiliary Covariates
Joo-Hyun Lee
Hanbaek Lyu
W. Yao
30
1
0
14 Jun 2022
Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax
  Optimization
Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization
Junchi Yang
Xiang Li
Niao He
ODL
29
22
0
01 Jun 2022
Efficient-Adam: Communication-Efficient Distributed Adam
Efficient-Adam: Communication-Efficient Distributed Adam
Congliang Chen
Li Shen
Wei Liu
Zhi-Quan Luo
34
19
0
28 May 2022
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad
  Stepsize
High Probability Bounds for a Class of Nonconvex Algorithms with AdaGrad Stepsize
Ali Kavis
Kfir Y. Levy
V. Cevher
22
38
0
06 Apr 2022
Adaptive Gradient Methods with Local Guarantees
Adaptive Gradient Methods with Local Guarantees
Zhou Lu
Wenhan Xia
Sanjeev Arora
Elad Hazan
ODL
27
9
0
02 Mar 2022
A Novel Convergence Analysis for Algorithms of the Adam Family
A Novel Convergence Analysis for Algorithms of the Adam Family
Zhishuai Guo
Yi Tian Xu
W. Yin
R. L. Jin
Tianbao Yang
39
47
0
07 Dec 2021
Adaptive Differentially Private Empirical Risk Minimization
Adaptive Differentially Private Empirical Risk Minimization
Xiaoxia Wu
Lingxiao Wang
Irina Cristali
Quanquan Gu
Rebecca Willett
38
6
0
14 Oct 2021
On the Convergence of Decentralized Adaptive Gradient Methods
On the Convergence of Decentralized Adaptive Gradient Methods
Xiangyi Chen
Belhal Karimi
Weijie Zhao
Ping Li
23
21
0
07 Sep 2021
A Decentralized Federated Learning Framework via Committee Mechanism
  with Convergence Guarantee
A Decentralized Federated Learning Framework via Committee Mechanism with Convergence Guarantee
Chunjiang Che
Xiaoli Li
Chuan Chen
Xiaoyu He
Zibin Zheng
FedML
33
72
0
01 Aug 2021
A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max
  Optimization Problems
A Decentralized Adaptive Momentum Method for Solving a Class of Min-Max Optimization Problems
Babak Barazandeh
Tianjian Huang
George Michailidis
24
12
0
10 Jun 2021
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective
Kushal Chakrabarti
Nikhil Chopra
ODL
AI4CE
37
9
0
31 May 2021
A Knowledge Graph-Enhanced Tensor Factorisation Model for Discovering
  Drug Targets
A Knowledge Graph-Enhanced Tensor Factorisation Model for Discovering Drug Targets
Cheng Ye
Rowan Swiers
Stephen Bonner
Ian P Barrett
38
12
0
20 May 2021
Stochastic gradient descent with noise of machine learning type. Part I:
  Discrete time analysis
Stochastic gradient descent with noise of machine learning type. Part I: Discrete time analysis
Stephan Wojtowytsch
25
50
0
04 May 2021
Variance Reduced Training with Stratified Sampling for Forecasting
  Models
Variance Reduced Training with Stratified Sampling for Forecasting Models
Yucheng Lu
Youngsuk Park
Lifan Chen
Bernie Wang
Christopher De Sa
Dean Phillips Foster
AI4TS
38
17
0
02 Mar 2021
Scalable Balanced Training of Conditional Generative Adversarial Neural
  Networks on Image Data
Scalable Balanced Training of Conditional Generative Adversarial Neural Networks on Image Data
Massimiliano Lupo Pasini
Vittorio Gabbi
Junqi Yin
S. Perotto
N. Laanait
GAN
AI4CE
24
3
0
21 Feb 2021
Block majorization-minimization with diminishing radius for constrained
  nonconvex optimization
Block majorization-minimization with diminishing radius for constrained nonconvex optimization
Hanbaek Lyu
Yuchen Li
21
10
0
07 Dec 2020
Sequential convergence of AdaGrad algorithm for smooth convex
  optimization
Sequential convergence of AdaGrad algorithm for smooth convex optimization
Cheik Traoré
Edouard Pauwels
14
21
0
24 Nov 2020
Learning explanations that are hard to vary
Learning explanations that are hard to vary
Giambattista Parascandolo
Alexander Neitz
Antonio Orvieto
Luigi Gresele
Bernhard Schölkopf
FAtt
21
178
0
01 Sep 2020
Adaptive Gradient Methods for Constrained Convex Optimization and
  Variational Inequalities
Adaptive Gradient Methods for Constrained Convex Optimization and Variational Inequalities
Alina Ene
Huy Le Nguyen
Adrian Vladu
ODL
30
28
0
17 Jul 2020
Incremental Without Replacement Sampling in Nonconvex Optimization
Incremental Without Replacement Sampling in Nonconvex Optimization
Edouard Pauwels
38
5
0
15 Jul 2020
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and
  Interpolation
SGD for Structured Nonconvex Functions: Learning Rates, Minibatching and Interpolation
Robert Mansel Gower
Othmane Sebbouh
Nicolas Loizou
25
74
0
18 Jun 2020
Optimal Complexity in Decentralized Training
Optimal Complexity in Decentralized Training
Yucheng Lu
Christopher De Sa
38
72
0
15 Jun 2020
Non-convergence of stochastic gradient descent in the training of deep
  neural networks
Non-convergence of stochastic gradient descent in the training of deep neural networks
Patrick Cheridito
Arnulf Jentzen
Florian Rossmannek
14
37
0
12 Jun 2020
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast
  Convergence
Stochastic Polyak Step-size for SGD: An Adaptive Learning Rate for Fast Convergence
Nicolas Loizou
Sharan Vaswani
I. Laradji
Simon Lacoste-Julien
27
181
0
24 Feb 2020
Momentum Improves Normalized SGD
Momentum Improves Normalized SGD
Ashok Cutkosky
Harsh Mehta
ODL
10
118
0
09 Feb 2020
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex
  Optimization
Low Rank Saddle Free Newton: A Scalable Method for Stochastic Nonconvex Optimization
Thomas O'Leary-Roseberry
Nick Alger
Omar Ghattas
ODL
37
9
0
07 Feb 2020
12
Next