Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate

19 May 2018

Papers citing "Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate"

11 / 11 papers shown

Title
On the Convergence of Adam and Beyond Sashank J. Reddi Satyen Kale Surinder Kumar 87 2,494 0 19 Apr 2019
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods Zhiming Zhou Qingru Zhang Guansong Lu Hongwei Wang Weinan Zhang Yong Yu 49 66 0 29 Sep 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks Jinghui Chen Dongruo Zhou Yiqi Tang Ziyan Yang Yuan Cao Quanquan Gu ODL 72 192 0 18 Jun 2018
Visualizing the Loss Landscape of Neural Nets Hao Li Zheng Xu Gavin Taylor Christoph Studer Tom Goldstein 240 1,885 0 28 Dec 2017
Improving Generalization Performance by Switching from Adam to SGD N. Keskar R. Socher ODL 86 523 0 20 Dec 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 56 1,028 0 23 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients Lukas Balles Philipp Hennig 66 168 0 22 May 2017
Wide Residual Networks Sergey Zagoruyko N. Komodakis 318 7,971 0 23 May 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.0K 193,426 0 10 Dec 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.5K 149,842 0 22 Dec 2014
ADADELTA: An Adaptive Learning Rate Method Matthew D. Zeiler ODL 132 6,623 0 22 Dec 2012