Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective

31 May 2021

Kushal Chakrabarti

Papers citing "Generalized AdaGrad (G-AdaGrad) and Adam: A State-Space Perspective"

18 / 18 papers shown

Title
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients Juntang Zhuang Tommy M. Tang Yifan Ding S. Tatikonda Nicha Dvornek X. Papademetris James S. Duncan ODL 162 514 0 15 Oct 2020
Iterative Pre-Conditioning for Expediting the Gradient-Descent Method: The Distributed Linear Least-Squares Problem Kushal Chakrabarti Nirupam Gupta Nikhil Chopra 20 11 0 06 Aug 2020
A Simple Convergence Proof of Adam and Adagrad Alexandre Défossez Léon Bottou Francis R. Bach Nicolas Usunier 109 155 0 05 Mar 2020
On the Convergence of Adam and Beyond Sashank J. Reddi Satyen Kale Surinder Kumar 93 2,499 0 19 Apr 2019
Convergence and Dynamical Behavior of the ADAM Algorithm for Non-Convex Stochastic Optimization Anas Barakat Pascal Bianchi 48 76 0 04 Oct 2018
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods Zhiming Zhou Qingru Zhang Guansong Lu Hongwei Wang Weinan Zhang Yong Yu 49 66 0 29 Sep 2018
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization Xiangyi Chen Sijia Liu Ruoyu Sun Mingyi Hong 55 323 0 08 Aug 2018
Convergence guarantees for RMSProp and ADAM in non-convex optimization and an empirical comparison to Nesterov acceleration Soham De Anirbit Mukherjee Enayat Ullah 52 101 0 18 Jul 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes Rachel A. Ward Xiaoxia Wu Léon Bottou ODL 64 364 0 05 Jun 2018
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes Xiaoyun Li Francesco Orabona 69 295 0 21 May 2018
WNGrad: Learn the Learning Rate in Gradient Descent Xiaoxia Wu Rachel A. Ward Léon Bottou 44 87 0 07 Mar 2018
Deep contextualized word representations Matthew E. Peters Mark Neumann Mohit Iyyer Matt Gardner Christopher Clark Kenton Lee Luke Zettlemoyer NAI 206 11,549 0 15 Feb 2018
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 62 1,030 0 23 May 2017
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 894 6,788 0 26 Sep 2016
Optimization Methods for Large-Scale Machine Learning Léon Bottou Frank E. Curtis J. Nocedal 236 3,208 0 15 Jun 2016
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks Alec Radford Luke Metz Soumith Chintala GAN OOD 250 14,008 0 19 Nov 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.8K 150,039 0 22 Dec 2014
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler ODL 150 6,624 0 22 Dec 2012