Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1806.06763
Cited By
Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks
18 June 2018
Jinghui Chen
Dongruo Zhou
Yiqi Tang
Ziyan Yang
Yuan Cao
Quanquan Gu
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Closing the Generalization Gap of Adaptive Gradient Methods in Training Deep Neural Networks"
36 / 36 papers shown
Title
Spectral-factorized Positive-definite Curvature Learning for NN Training
Wu Lin
Felix Dangel
Runa Eschenhagen
Juhan Bae
Richard E. Turner
Roger B. Grosse
103
0
0
10 Feb 2025
Regularized Gradient Clipping Provably Trains Wide and Deep Neural Networks
Matteo Tucat
Anirbit Mukherjee
Procheta Sen
Mingfei Sun
Omar Rivasplata
MLT
51
1
0
12 Apr 2024
On Convergence of Adam for Stochastic Optimization under Relaxed Assumptions
Yusu Hong
Junhong Lin
76
13
0
06 Feb 2024
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
64
8
0
26 Jun 2023
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity
A. Davtyan
Sepehr Sameni
L. Cerkezi
Givi Meishvili
Adam Bielski
Paolo Favaro
ODL
91
2
0
07 Jul 2021
Towards Better Understanding of Adaptive Gradient Algorithms in Generative Adversarial Nets
Mingrui Liu
Youssef Mroueh
Jerret Ross
Wei Zhang
Xiaodong Cui
Payel Das
Tianbao Yang
ODL
53
63
0
26 Dec 2019
Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks
Difan Zou
Ziniu Hu
Yewen Wang
Song Jiang
Yizhou Sun
Quanquan Gu
GNN
73
282
0
17 Nov 2019
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares
Rong Ge
Sham Kakade
Rahul Kidambi
Praneeth Netrapalli
67
152
0
29 Apr 2019
On the Convergence of Adam and Beyond
Sashank J. Reddi
Satyen Kale
Surinder Kumar
49
2,482
0
19 Apr 2019
Adaptive Gradient Methods with Dynamic Bound of Learning Rate
Liangchen Luo
Yuanhao Xiong
Yan Liu
Xu Sun
ODL
36
600
0
26 Feb 2019
On the Convergence of A Class of Adam-Type Algorithms for Non-Convex Optimization
Xiangyi Chen
Sijia Liu
Ruoyu Sun
Mingyi Hong
40
322
0
08 Aug 2018
AdaGrad stepsizes: Sharp convergence over nonconvex landscapes
Rachel A. Ward
Xiaoxia Wu
Léon Bottou
ODL
45
365
0
05 Jun 2018
On the Convergence of Stochastic Gradient Descent with Adaptive Stepsizes
Xiaoyun Li
Francesco Orabona
51
294
0
21 May 2018
On the insufficiency of existing momentum schemes for Stochastic Optimization
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
63
118
0
15 Mar 2018
Understanding Short-Horizon Bias in Stochastic Meta-Optimization
Yuhuai Wu
Mengye Ren
Renjie Liao
Roger C. Grosse
68
137
0
06 Mar 2018
A Walk with SGD
Chen Xing
Devansh Arpit
Christos Tsirigotis
Yoshua Bengio
65
118
0
24 Feb 2018
Improving Generalization Performance by Switching from Adam to SGD
N. Keskar
R. Socher
ODL
64
522
0
20 Dec 2017
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
91
2,112
0
14 Nov 2017
Normalized Direction-preserving Adam
Zijun Zhang
Lin Ma
Zongpeng Li
Chuan Wu
ODL
35
29
0
13 Sep 2017
Variants of RMSProp and Adagrad with Logarithmic Regret Bounds
Mahesh Chandra Mukkamala
Matthias Hein
ODL
38
258
0
17 Jun 2017
Train longer, generalize better: closing the generalization gap in large batch training of neural networks
Elad Hoffer
Itay Hubara
Daniel Soudry
ODL
138
798
0
24 May 2017
The Marginal Value of Adaptive Gradient Methods in Machine Learning
Ashia Wilson
Rebecca Roelofs
Mitchell Stern
Nathan Srebro
Benjamin Recht
ODL
48
1,023
0
23 May 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.0K
20,692
0
17 Apr 2017
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
415
10,281
0
16 Nov 2016
Semi-Supervised Classification with Graph Convolutional Networks
Thomas Kipf
Max Welling
GNN
SSL
424
28,795
0
09 Sep 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
607
36,599
0
25 Aug 2016
Accelerate Stochastic Subgradient Method by Leveraging Local Growth Condition
Yi Tian Xu
Qihang Lin
Tianbao Yang
47
11
0
04 Jul 2016
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
244
7,951
0
23 May 2016
Stochastic Variance Reduction for Nonconvex Optimization
Sashank J. Reddi
Ahmed S. Hefny
S. Sra
Barnabás Póczós
Alex Smola
80
598
0
19 Mar 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.3K
192,638
0
10 Dec 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMat
ObjD
393
61,900
0
04 Jun 2015
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
678
149,474
0
22 Dec 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
822
99,991
0
04 Sep 2014
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
Saeed Ghadimi
Guanghui Lan
ODL
57
1,538
0
22 Sep 2013
ADADELTA: An Adaptive Learning Rate Method
Matthew D. Zeiler
ODL
106
6,619
0
22 Dec 2012
Adaptive Bound Optimization for Online Convex Optimization
H. B. McMahan
Matthew J. Streeter
ODL
75
386
0
26 Feb 2010
1