On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance

2 February 2023

Papers citing "On Suppressing Range of Adaptive Stepsizes of Adam to Improve Generalisation Performance"

7 / 7 papers shown

Title
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients Juntang Zhuang Tommy M. Tang Yifan Ding S. Tatikonda Nicha Dvornek X. Papademetris James S. Duncan ODL 139 510 0 15 Oct 2020
On the distance between two neural networks and the stability of learning Jeremy Bernstein Arash Vahdat Yisong Yue Xuan Li ODL 227 58 0 09 Feb 2020
Why are Adaptive Methods Good for Attention Models? J.N. Zhang Sai Praneeth Karimireddy Andreas Veit Seungyeon Kim Sashank J. Reddi Surinder Kumar S. Sra 90 80 0 06 Dec 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate Liangchen Luo Yuanhao Xiong Yan Liu Xu Sun ODL 74 602 0 26 Feb 2019
The Marginal Value of Adaptive Gradient Methods in Machine Learning Ashia Wilson Rebecca Roelofs Mitchell Stern Nathan Srebro Benjamin Recht ODL 56 1,028 0 23 May 2017
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients Lukas Balles Philipp Hennig 66 168 0 22 May 2017
Improved Training of Wasserstein GANs Ishaan Gulrajani Faruk Ahmed Martín Arjovsky Vincent Dumoulin Aaron Courville GAN 173 9,533 0 31 Mar 2017