LaProp: Separating Momentum and Adaptivity in Adam

12 February 2020

Papers citing "LaProp: Separating Momentum and Adaptivity in Adam"

7 / 7 papers shown

Title
What makes a good feedforward computational graph? Alex Vitvitskyi J. G. Araújo Marc Lackenby Petar Velickovic 94 2 0 10 Feb 2025
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 64 168 0 19 Dec 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate Liangchen Luo Yuanhao Xiong Yan Liu Xu Sun ODL 34 600 0 26 Feb 2019
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks Difan Zou Yuan Cao Dongruo Zhou Quanquan Gu ODL 114 448 0 21 Nov 2018
A general system of differential equations to model first order adaptive algorithms André Belotto da Silva Maxime Gazeau 34 33 0 31 Oct 2018
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks Jinghui Chen Dongruo Zhou Yiqi Tang Ziyan Yang Yuan Cao Quanquan Gu ODL 55 193 0 18 Jun 2018
Dissecting Adam: The Sign, Magnitude and Variance of Stochastic Gradients Lukas Balles Philipp Hennig 62 163 0 22 May 2017