Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.05627
Cited By
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning
12 October 2020
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning"
36 / 36 papers shown
Title
Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods
Ruinan Jin
Difei Cheng
Hong Qiao
Xin Shi
Shaodong Liu
Bo Zhang
43
0
0
17 Apr 2025
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Jiahui Geng
Yue Shang
Ge Zhang
AI4CE
66
0
0
17 Mar 2025
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Dun Zeng
Zheshun Wu
Shiyu Liu
Yu Pan
Xiaoying Tang
Zenglin Xu
MLT
FedML
89
1
0
25 Nov 2024
Attribute Inference Attacks for Federated Regression Tasks
Francesco Diana
Othmane Marfoq
Chuan Xu
Giovanni Neglia
F. Giroire
Eoin Thomas
AAML
261
1
0
19 Nov 2024
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
72
1
0
16 Sep 2024
Robust deep labeling of radiological emphysema subtypes using squeeze and excitation convolutional neural networks: The MESA Lung and SPIROMICS Studies
A. Wysoczanski
Nabil Ettehadi
Soroush Arabshahi
Yifei Sun
K. H. Stukovsky
...
A. Comellas
Eric A. Hoffman
Andrew F. Laine
R. G. Barr
Elsa D. Angelini
OOD
19
0
0
01 Mar 2024
Evolutionary algorithms as an alternative to backpropagation for supervised training of Biophysical Neural Networks and Neural ODEs
James Hazelden
Yuhan Helena Liu
Eli Shlizerman
E. Shea-Brown
49
2
0
17 Nov 2023
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
33
8
0
26 Jun 2023
Semantic Segmentation of Porosity in 4D Spatio-Temporal X-ray μCT of Titanium Coated Ni wires using Deep Learning
Pradyumna Elavarthi
Arun J. Bhattacharjee
A. P. Y. Puente
Anca L. Ralescu
27
0
0
24 Jun 2023
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
33
13
0
22 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
42
15
0
21 May 2023
Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks
Xuanzhe Xiao
Zengyi Li
Chuanlong Xie
Fengwei Zhou
23
3
0
06 Apr 2023
Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation
Zheyu Zhang
Bin Wang
Lanhong Yao
Ugur Demir
Debesh Jha
I. Turkbey
Boqing Gong
Ulas Bagci
AAML
MedIm
OOD
34
11
0
05 Apr 2023
Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models
Anant Raj
Umut Simsekli
Alessandro Rudi
DiffM
31
1
0
30 Mar 2023
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
Zijian Liu
Zhengyuan Zhou
30
10
0
22 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
FOSI: Hybrid First and Second Order Optimization
Hadar Sivan
Moshe Gabel
Assaf Schuster
ODL
34
2
0
16 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
19
56
0
14 Feb 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
25
13
0
19 Jan 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Shancheng Fang
Zhendong Mao
Hongtao Xie
Yuxin Wang
C. Yan
Yongdong Zhang
34
53
0
19 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
49
16
0
31 Oct 2022
When Does Re-initialization Work?
Sheheryar Zaidi
Tudor Berariu
Hyunjik Kim
J. Bornschein
Claudia Clopath
Yee Whye Teh
Razvan Pascanu
40
10
0
20 Jun 2022
Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares
Anant Raj
Melih Barsbey
Mert Gurbuzbalaban
Lingjiong Zhu
Umut Simsekli
19
9
0
02 Jun 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
28
58
0
01 Feb 2022
Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks
Tolga Birdal
Aaron Lou
Leonidas J. Guibas
Umut cSimcsekli
32
61
0
25 Nov 2021
FastCover: An Unsupervised Learning Framework for Multi-Hop Influence Maximization in Social Networks
Ru-Fen Ni
Xue Li
Fangqi Li
Xiaofeng Gao
Guihai Chen
9
5
0
31 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
57
40
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
High performing ensemble of convolutional neural networks for insect pest image detection
L. Nanni
Alessandro Manfe
Gianluca Maguolo
A. Lumini
S. Brahnam
15
77
0
28 Aug 2021
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
47
39
0
25 Aug 2021
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
40
1
0
12 Aug 2021
Effective Evaluation of Deep Active Learning on Image Classification Tasks
Nathan Beck
D. Sivasubramanian
Apurva Dani
Ganesh Ramakrishnan
Rishabh K. Iyer
VLM
20
38
0
16 Jun 2021
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks
Bohan Wang
Qi Meng
Wei Chen
Tie-Yan Liu
30
33
0
11 Dec 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
1