ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.07557
  4. Cited By
Nostalgic Adam: Weighting more of the past gradients when designing the
  adaptive learning rate

Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate

19 May 2018
Haiwen Huang
Changzhang Wang
Bin Dong
    ODL
ArXivPDFHTML

Papers citing "Nostalgic Adam: Weighting more of the past gradients when designing the adaptive learning rate"

11 / 11 papers shown
Title
On the Convergence of Adam and Beyond
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
87
2,494
0
19 Apr 2019
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate
  Methods
AdaShift: Decorrelation and Convergence of Adaptive Learning Rate Methods
Zhiming Zhou
Qingru Zhang
Guansong Lu
Hongwei Wang
Weinan Zhang
Yong Yu
49
66
0
29 Sep 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training
  Deep Neural Networks
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
72
192
0
18 Jun 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
240
1,885
0
28 Dec 2017
Improving Generalization Performance by Switching from Adam to SGD
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
86
523
0
20 Dec 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
56
1,028
0
23 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic
  Gradients
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients
Lukas Balles
Philipp Hennig
66
168
0
22 May 2017
Wide Residual Networks
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
318
7,971
0
23 May 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.0K
193,426
0
10 Dec 2015
Adam: A Method for Stochastic Optimization
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
1.5K
149,842
0
22 Dec 2014
ADADELTA: An Adaptive Learning Rate Method
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
132
6,623
0
22 Dec 2012
1