On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes

21 May 2018

Papers citing "On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes"

19 / 19 papers shown

Title
Distributed Stochastic Gradient Descent with Staleness: A Stochastic Delay Differential Equation Based Framework Siyuan Yu Wei Chen H. V. Poor 63 0 0 17 Jun 2024
Stochastic Polyak Step-sizes and Momentum: Convergence Guarantees and Practical Performance Dimitris Oikonomou Nicolas Loizou 64 5 0 06 Jun 2024
Remove that Square Root: A New Efficient Scale-Invariant Version of AdaGrad Sayantan Choudhury N. Tupitsa Nicolas Loizou Samuel Horváth Martin Takáč Eduard A. Gorbunov 54 1 0 05 Mar 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions Yusu Hong Junhong Lin 76 13 0 06 Feb 2024
On the Convergence of Adam and Beyond Sashank J. Reddi Satyen Kale Surinder Kumar 52 2,482 0 19 Apr 2019
Online Adaptive Methods, Universality and Acceleration Kfir Y. Levy A. Yurtsever Volkan Cevher ODL 57 89 0 08 Sep 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes Rachel A. Ward Xiaoxia Wu Léon Bottou ODL 50 365 0 05 Jun 2018
WNGrad: Learn the Learning Rate in Gradient Descent Xiaoxia Wu Rachel A. Ward Léon Bottou 41 87 0 07 Mar 2018
Black-Box Reductions for Parameter-free Online Learning in Banach Spaces Ashok Cutkosky Francesco Orabona 69 145 0 17 Feb 2018
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition Hamed Karimi J. Nutini Mark Schmidt 221 1,208 0 16 Aug 2016
Optimization Methods for Large-Scale Machine Learning Léon Bottou Frank E. Curtis J. Nocedal 173 3,198 0 15 Jun 2016
Coin Betting and Parameter-Free Online Learning Francesco Orabona D. Pál 93 165 0 12 Feb 2016
Scale-Free Online Learning Francesco Orabona D. Pál 46 103 0 08 Jan 2016
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 808 149,474 0 22 Dec 2014
Optimization, Learning, and Games with Predictable Sequences Alexander Rakhlin Karthik Sridharan 54 377 0 08 Nov 2013
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming Saeed Ghadimi Guanghui Lan ODL 71 1,538 0 22 Sep 2013
Stochastic Majorization-Minimization Algorithms for Large-Scale Optimization Julien Mairal 76 160 0 19 Jun 2013
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler ODL 108 6,619 0 22 Dec 2012
Optimal Distributed Online Prediction using Mini-Batches O. Dekel Ran Gilad-Bachrach Ohad Shamir Lin Xiao 241 683 0 07 Dec 2010