Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1805.07557
Cited By
Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate
19 May 2018
Haiwen Huang
Changzhang Wang
Bin Dong
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate"
11 / 11 papers shown
Title
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
87
2,494
0
19 Apr 2019
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
49
66
0
29 Sep 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
72
192
0
18 Jun 2018
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
240
1,885
0
28 Dec 2017
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
86
523
0
20 Dec 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
56
1,028
0
23 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
66
168
0
22 May 2017
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
318
7,971
0
23 May 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.0K
193,426
0
10 Dec 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.5K
149,842
0
22 Dec 2014
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
132
6,623
0
22 Dec 2012
1