ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.08114
  4. Cited By
On the Convergence of Stochastic Gradient Descent with Adaptive
  Stepsizes

On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

21 May 2018
Xiaoyun Li
Francesco Orabona
ArXivPDFHTML

Papers citing "On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes"

19 / 19 papers shown
Title
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework
Siyuan Yu
Wei Chen
H. V. Poor
63
0
0
17 Jun 2024
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance
Dimitris Oikonomou
Nicolas Loizou
64
5
0
06 Jun 2024
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad
Sayantan Choudhury
N. Tupitsa
Nicolas Loizou
Samuel Horváth
Martin Takáč
Eduard A. Gorbunov
54
1
0
05 Mar 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
76
13
0
06 Feb 2024
On the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
52
2,482
0
19 Apr 2019
Online Adaptive Methods, Universality and Acceleration
Online Adaptive Methods, Universality and Acceleration
Kfir Y. Levy
A. Yurtsever
Volkan Cevher
ODL
57
89
0
08 Sep 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
50
365
0
05 Jun 2018
WNGrad: Learn the Learning Rate in Gradient Descent
WNGrad: Learn the Learning Rate in Gradient Descent
Xiaoxia Wu
Rachel A. Ward
Léon Bottou
41
87
0
07 Mar 2018
Black-Box Reductions for Parameter-free Online Learning in Banach Spaces
Black-Box Reductions for Parameter-free Online Learning in Banach Spaces
Ashok Cutkosky
Francesco Orabona
69
145
0
17 Feb 2018
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
221
1,208
0
16 Aug 2016
Optimization Methods for Large-Scale Machine Learning
Optimization Methods for Large-Scale Machine Learning
Léon Bottou
Frank E. Curtis
J. Nocedal
173
3,198
0
15 Jun 2016
Coin Betting and Parameter-Free Online Learning
Coin Betting and Parameter-Free Online Learning
Francesco Orabona
D. Pál
93
165
0
12 Feb 2016
Scale-Free Online Learning
Scale-Free Online Learning
Francesco Orabona
D. Pál
46
103
0
08 Jan 2016
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
808
149,474
0
22 Dec 2014
Optimization, Learning, and Games with Predictable Sequences
Optimization, Learning, and Games with Predictable Sequences
Alexander Rakhlin
Karthik Sridharan
54
377
0
08 Nov 2013
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic
  Programming
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
Saeed Ghadimi
Guanghui Lan
ODL
71
1,538
0
22 Sep 2013
Stochastic Majorization-Minimization Algorithms for Large-Scale
  Optimization
Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization
Julien Mairal
76
160
0
19 Jun 2013
ADADELTA: An Adaptive Learning Rate Method
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
108
6,619
0
22 Dec 2012
Optimal Distributed Online Prediction using Mini-Batches
Optimal Distributed Online Prediction using Mini-Batches
O. Dekel
Ran Gilad-Bachrach
Ohad Shamir
Lin Xiao
241
683
0
07 Dec 2010
1