Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.07628
Cited By
Improving Generalization Performance by Switching from Adam to SGD
20 December 2017
N. Keskar
R. Socher
ODL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Generalization Performance by Switching from Adam to SGD"
50 / 181 papers shown
Title
Why is parameter averaging beneficial in SGD? An objective smoothing perspective
Atsushi Nitanda
Ryuhei Kikuchi
Shugo Maeda
Denny Wu
FedML
25
0
0
18 Feb 2023
Cyclic and Randomized Stepsizes Invoke Heavier Tails in SGD than Constant Stepsize
Mert Gurbuzbalaban
Yuanhan Hu
Umut Simsekli
Lingjiong Zhu
LRM
23
1
0
10 Feb 2023
Flatter, faster: scaling momentum for optimal speedup of SGD
Aditya Cowsik
T. Can
Paolo Glorioso
57
5
0
28 Oct 2022
Adaptive Gradient Methods at the Edge of Stability
Jeremy M. Cohen
Behrooz Ghorbani
Shankar Krishnan
Naman Agarwal
Sourabh Medapati
...
Daniel Suo
David E. Cardoze
Zachary Nado
George E. Dahl
Justin Gilmer
ODL
31
51
0
29 Jul 2022
PoF: Post-Training of Feature Extractor for Improving Generalization
Ikuro Sato
Ryota Yamada
Masayuki Tanaka
Nakamasa Inoue
Rei Kawakami
21
2
0
05 Jul 2022
A Closer Look at Smoothness in Domain Adversarial Training
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
Arihant Jain
R. Venkatesh Babu
32
119
0
16 Jun 2022
WaveMix: A Resource-efficient Neural Network for Image Analysis
Pranav Jeevan
Kavitha Viswanathan
S. AnanduA
A. Sethi
20
20
0
28 May 2022
MolMiner: You only look once for chemical structure recognition
Youjun Xu
Jinchuan Xiao
Chia-Han Chou
Jianhang Zhang
Jintao Zhu
...
Zhen Zhang
Shuhao Zhang
Weilin Zhang
L. Lai
Jianfeng Pei
26
19
0
23 May 2022
A Dynamic Weighted Tabular Method for Convolutional Neural Networks
Md Ifraham Iqbal
Md. Saddam Hossain Mukta
Ahmed Rafi Hasan
LMTD
19
12
0
20 May 2022
An Adaptive Gradient Method with Energy and Momentum
Hailiang Liu
Xuping Tian
ODL
18
9
0
23 Mar 2022
Surrogate Gap Minimization Improves Sharpness-Aware Training
Juntang Zhuang
Boqing Gong
Liangzhe Yuan
Huayu Chen
Hartwig Adam
Nicha Dvornek
S. Tatikonda
James Duncan
Ting Liu
24
146
0
15 Mar 2022
Optimal learning rate schedules in high-dimensional non-convex optimization problems
Stéphane dÁscoli
Maria Refinetti
Giulio Biroli
16
7
0
09 Feb 2022
Convolutional Xformers for Vision
Pranav Jeevan
Amit Sethi
ViT
55
12
0
25 Jan 2022
Weight Expansion: A New Perspective on Dropout and Generalization
Gao Jin
Xinping Yi
Pengfei Yang
Lijun Zhang
S. Schewe
Xiaowei Huang
29
5
0
23 Jan 2022
A Convergent ADMM Framework for Efficient Neural Network Training
Junxiang Wang
Hongyi Li
Liang Zhao
11
1
0
22 Dec 2021
Training Deep Neural Networks with Adaptive Momentum Inspired by the Quadratic Optimization
Tao Sun
Huaming Ling
Zuoqiang Shi
Dongsheng Li
Bao Wang
ODL
22
13
0
18 Oct 2021
Toward Communication Efficient Adaptive Gradient Method
Xiangyi Chen
Xiaoyun Li
P. Li
FedML
36
41
0
10 Sep 2021
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
47
38
0
25 Aug 2021
VTLayout: Fusion of Visual and Text Features for Document Layout Analysis
Shoubin Li
Xuyan Ma
Shuaiqun Pan
Jun Hu
Lin Shi
Qing Wang
20
9
0
12 Aug 2021
Physics-constrained Deep Learning for Robust Inverse ECG Modeling
Jianxin Xie
B. Yao
30
21
0
26 Jul 2021
Exploring the efficacy of neural networks for trajectory compression and the inverse problem
Theodoros Ntakouris
27
0
0
19 Jul 2021
Globally Convergent Multilevel Training of Deep Residual Networks
Alena Kopanicáková
Rolf Krause
37
15
0
15 Jul 2021
Rethinking Adam: A Twofold Exponential Moving Average Approach
Yizhou Wang
Yue Kang
Can Qin
Huan Wang
Yi Xu
Yulun Zhang
Y. Fu
ODL
17
6
0
22 Jun 2021
A decreasing scaling transition scheme from Adam to SGD
Kun Zeng
Jinlan Liu
Zhixia Jiang
Dongpo Xu
ODL
13
10
0
12 Jun 2021
Polygonal Unadjusted Langevin Algorithms: Creating stable and efficient adaptive algorithms for neural networks
Dong-Young Lim
Sotirios Sabanis
31
11
0
28 May 2021
TAG: Task-based Accumulated Gradients for Lifelong learning
Pranshu Malviya
B. Ravindran
Sarath Chandar
CLL
41
5
0
11 May 2021
Coconut trees detection and segmentation in aerial imagery using mask region-based convolution neural network
M. Iqbal
Hazrat Ali
Son N. Tran
Talha Iqbal
15
41
0
10 May 2021
BERT-based Chinese Text Classification for Emergency Domain with a Novel Loss Function
Zhongju Wang
Long Wang
Chao-Wei Huang
Xiong Luo
17
5
0
09 Apr 2021
Quantum Enhanced Filter: QFilter
Parfait Atchade-Adelomou
Guillermo Alonso-Linaje
6
17
0
07 Apr 2021
Exploiting Adam-like Optimization Algorithms to Improve the Performance of Convolutional Neural Networks
L. Nanni
Gianluca Maguolo
A. Lumini
ODL
MedIm
41
10
0
26 Mar 2021
Smoothness Analysis of Adversarial Training
Sekitoshi Kanai
Masanori Yamada
Hiroshi Takahashi
Yuki Yamanaka
Yasutoshi Ida
AAML
40
6
0
02 Mar 2021
Statistical Measures For Defining Curriculum Scoring Function
Vinu Sankar Sadasivan
A. Dasgupta
22
2
0
27 Feb 2021
Versatile and Robust Transient Stability Assessment via Instance Transfer Learning
Seyedali Meghdadi
Guido Tack
Ariel Liebman
Nicolas Langrené
Christoph Bergmeir
6
1
0
20 Feb 2021
Local Convergence of Adaptive Gradient Descent Optimizers
Sebastian Bock
M. Weiß
ODL
11
2
0
19 Feb 2021
Structured Dropout Variational Inference for Bayesian Neural Networks
S. Nguyen
Duong Nguyen
Khai Nguyen
Khoat Than
Hung Bui
Nhat Ho
BDL
DRL
8
7
0
16 Feb 2021
Dosimetric impact of physician style variations in contouring CTV for post-operative prostate cancer: A deep learning-based simulation study
Anjali Balagopal
D. Nguyen
Maryam Mashayekhi
H. Morgan
A. Garant
N. Desai
R. Hannan
Mu-Han Lin
Steve B. Jiang
OOD
18
0
0
01 Feb 2021
Optimizing Convergence for Iterative Learning of ARIMA for Stationary Time Series
K. Styp-Rekowski
Florian Schmidt
O. Kao
AI4TS
9
0
0
25 Jan 2021
AsymptoticNG: A regularized natural gradient optimization algorithm with look-ahead strategy
Zedong Tang
Fenlong Jiang
Junke Song
Maoguo Gong
Hao Li
F. Yu
Zidong Wang
Min Wang
ODL
20
1
0
24 Dec 2020
The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks
Bohan Wang
Qi Meng
Wei Chen
Tie-Yan Liu
30
33
0
11 Dec 2020
Mixing ADAM and SGD: a Combined Optimization Method
Nicola Landro
I. Gallo
Riccardo La Grassa
ODL
8
24
0
16 Nov 2020
A Random Matrix Theory Approach to Damping in Deep Learning
Diego Granziol
Nicholas P. Baskerville
AI4CE
ODL
29
2
0
15 Nov 2020
SALR: Sharpness-aware Learning Rate Scheduler for Improved Generalization
Xubo Yue
Maher Nouiehed
Raed Al Kontar
ODL
19
4
0
10 Nov 2020
"You eat with your eyes first": Optimizing Yelp Image Advertising
Gaurab Banerjee
Samuel Spinner
Yasmine Mitchell
11
1
0
03 Nov 2020
Deep Learning based Automated Forest Health Diagnosis from Aerial Images
Chia-Yen Chiang
Chloe M. Barnes
Plamen Angelov
Richard M. Jiang
6
55
0
16 Oct 2020
AdaBelief Optimizer: Adapting Stepsizes by the Belief in Observed Gradients
Juntang Zhuang
Tommy M. Tang
Yifan Ding
S. Tatikonda
Nicha Dvornek
X. Papademetris
James S. Duncan
ODL
14
501
0
15 Oct 2020
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
25
228
0
12 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
110
1,278
0
03 Oct 2020
Expectigrad: Fast Stochastic Optimization with Robust Convergence Properties
Brett Daley
Chris Amato
ODL
19
4
0
03 Oct 2020
Projection-Free Adaptive Gradients for Large-Scale Optimization
Cyrille W. Combettes
Christoph Spiegel
Sebastian Pokutta
ODL
18
10
0
29 Sep 2020
Faster Biological Gradient Descent Learning
H. Li
ODL
8
1
0
27 Sep 2020
Previous
1
2
3
4
Next