ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.05627
  4. Cited By
Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM
  in Deep Learning

Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning

12 October 2020
Pan Zhou
Jiashi Feng
Chao Ma
Caiming Xiong
Guosheng Lin
E. Weinan
ArXivPDFHTML

Papers citing "Towards Theoretically Understanding Why SGD Generalizes Better Than ADAM in Deep Learning"

36 / 36 papers shown
Title
Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods
Stochastic Gradient Descent in Non-Convex Problems: Asymptotic Convergence with Relaxed Step-Size via Stopping Time Methods
Ruinan Jin
Difei Cheng
Hong Qiao
Xin Shi
Shaodong Liu
Bo Zhang
43
0
0
17 Apr 2025
High-entropy Advantage in Neural Networks' Generalizability
High-entropy Advantage in Neural Networks' Generalizability
Entao Yang
Jiahui Geng
Yue Shang
Ge Zhang
AI4CE
66
0
0
17 Mar 2025
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Understanding Generalization of Federated Learning: the Trade-off between Model Stability and Optimization
Dun Zeng
Zheshun Wu
Shiyu Liu
Yu Pan
Xiaoying Tang
Zenglin Xu
MLT
FedML
89
1
0
25 Nov 2024
Attribute Inference Attacks for Federated Regression Tasks
Attribute Inference Attacks for Federated Regression Tasks
Francesco Diana
Othmane Marfoq
Chuan Xu
Giovanni Neglia
F. Giroire
Eoin Thomas
AAML
261
1
0
19 Nov 2024
Flash STU: Fast Spectral Transform Units
Flash STU: Fast Spectral Transform Units
Y. Isabel Liu
Windsor Nguyen
Yagiz Devre
Evan Dogariu
Anirudha Majumdar
Elad Hazan
AI4TS
72
1
0
16 Sep 2024
Robust deep labeling of radiological emphysema subtypes using squeeze
  and excitation convolutional neural networks: The MESA Lung and SPIROMICS
  Studies
Robust deep labeling of radiological emphysema subtypes using squeeze and excitation convolutional neural networks: The MESA Lung and SPIROMICS Studies
A. Wysoczanski
Nabil Ettehadi
Soroush Arabshahi
Yifei Sun
K. H. Stukovsky
...
A. Comellas
Eric A. Hoffman
Andrew F. Laine
R. G. Barr
Elsa D. Angelini
OOD
19
0
0
01 Mar 2024
Evolutionary algorithms as an alternative to backpropagation for
  supervised training of Biophysical Neural Networks and Neural ODEs
Evolutionary algorithms as an alternative to backpropagation for supervised training of Biophysical Neural Networks and Neural ODEs
James Hazelden
Yuhan Helena Liu
Eli Shlizerman
E. Shea-Brown
49
2
0
17 Nov 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
33
8
0
26 Jun 2023
Semantic Segmentation of Porosity in 4D Spatio-Temporal X-ray μCT of
  Titanium Coated Ni wires using Deep Learning
Semantic Segmentation of Porosity in 4D Spatio-Temporal X-ray μCT of Titanium Coated Ni wires using Deep Learning
Pradyumna Elavarthi
Arun J. Bhattacharjee
A. P. Y. Puente
Anca L. Ralescu
27
0
0
24 Jun 2023
Improving Convergence and Generalization Using Parameter Symmetries
Improving Convergence and Generalization Using Parameter Symmetries
Bo Zhao
Robert Mansel Gower
Robin Walters
Rose Yu
MoMe
33
13
0
22 May 2023
Two Sides of One Coin: the Limits of Untuned SGD and the Power of
  Adaptive Methods
Two Sides of One Coin: the Limits of Untuned SGD and the Power of Adaptive Methods
Junchi Yang
Xiang Li
Ilyas Fatkhullin
Niao He
42
15
0
21 May 2023
Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks
Heavy-Tailed Regularization of Weight Matrices in Deep Neural Networks
Xuanzhe Xiao
Zengyi Li
Chuanlong Xie
Fengwei Zhou
23
3
0
06 Apr 2023
Domain Generalization with Adversarial Intensity Attack for Medical
  Image Segmentation
Domain Generalization with Adversarial Intensity Attack for Medical Image Segmentation
Zheyu Zhang
Bin Wang
Lanhong Yao
Ugur Demir
Debesh Jha
I. Turkbey
Boqing Gong
Ulas Bagci
AAML
MedIm
OOD
34
11
0
05 Apr 2023
Efficient Sampling of Stochastic Differential Equations with Positive
  Semi-Definite Models
Efficient Sampling of Stochastic Differential Equations with Positive Semi-Definite Models
Anant Raj
Umut Simsekli
Alessandro Rudi
DiffM
31
1
0
30 Mar 2023
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises:
  High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
Stochastic Nonsmooth Convex Optimization with Heavy-Tailed Noises: High-Probability Bound, In-Expectation Rate and Initial Distance Adaptation
Zijian Liu
Zhengyuan Zhou
30
10
0
22 Mar 2023
Revisiting the Noise Model of Stochastic Gradient Descent
Revisiting the Noise Model of Stochastic Gradient Descent
Barak Battash
Ofir Lindenbaum
27
9
0
05 Mar 2023
FOSI: Hybrid First and Second Order Optimization
FOSI: Hybrid First and Second Order Optimization
Hadar Sivan
Moshe Gabel
Assaf Schuster
ODL
34
2
0
16 Feb 2023
A Modern Look at the Relationship between Sharpness and Generalization
A Modern Look at the Relationship between Sharpness and Generalization
Maksym Andriushchenko
Francesco Croce
Maximilian Müller
Matthias Hein
Nicolas Flammarion
3DH
19
56
0
14 Feb 2023
An SDE for Modeling SAM: Theory and Insights
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
25
13
0
19 Jan 2023
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for
  Scene Text Spotting
ABINet++: Autonomous, Bidirectional and Iterative Language Modeling for Scene Text Spotting
Shancheng Fang
Zhendong Mao
Hongtao Xie
Yuxin Wang
C. Yan
Yongdong Zhang
34
53
0
19 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo Zhao
I. Ganev
Robin Walters
Rose Yu
Nima Dehmamy
49
16
0
31 Oct 2022
When Does Re-initialization Work?
When Does Re-initialization Work?
Sheheryar Zaidi
Tudor Berariu
Hyunjik Kim
J. Bornschein
Claudia Clopath
Yee Whye Teh
Razvan Pascanu
40
10
0
20 Jun 2022
Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on
  Least Squares
Algorithmic Stability of Heavy-Tailed Stochastic Gradient Descent on Least Squares
Anant Raj
Melih Barsbey
Mert Gurbuzbalaban
Lingjiong Zhu
Umut Simsekli
19
9
0
02 Jun 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
28
58
0
01 Feb 2022
Intrinsic Dimension, Persistent Homology and Generalization in Neural
  Networks
Intrinsic Dimension, Persistent Homology and Generalization in Neural Networks
Tolga Birdal
Aaron Lou
Leonidas J. Guibas
Umut cSimcsekli
32
61
0
25 Nov 2021
FastCover: An Unsupervised Learning Framework for Multi-Hop Influence
  Maximization in Social Networks
FastCover: An Unsupervised Learning Framework for Multi-Hop Influence Maximization in Social Networks
Ru-Fen Ni
Xue Li
Fangqi Li
Xiaofeng Gao
Guihai Chen
9
5
0
31 Oct 2021
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Large Learning Rate Tames Homogeneity: Convergence and Balancing Effect
Yuqing Wang
Minshuo Chen
T. Zhao
Molei Tao
AI4CE
57
40
0
07 Oct 2021
Stochastic Training is Not Necessary for Generalization
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
High performing ensemble of convolutional neural networks for insect
  pest image detection
High performing ensemble of convolutional neural networks for insect pest image detection
L. Nanni
Alessandro Manfe
Gianluca Maguolo
A. Lumini
S. Brahnam
15
77
0
28 Aug 2021
Understanding the Generalization of Adam in Learning Neural Networks
  with Proper Regularization
Understanding the Generalization of Adam in Learning Neural Networks with Proper Regularization
Difan Zou
Yuan Cao
Yuanzhi Li
Quanquan Gu
MLT
AI4CE
47
39
0
25 Aug 2021
Logit Attenuating Weight Normalization
Logit Attenuating Weight Normalization
Aman Gupta
R. Ramanath
Jun Shi
Anika Ramachandran
Sirou Zhou
Mingzhou Zhou
S. Keerthi
40
1
0
12 Aug 2021
Effective Evaluation of Deep Active Learning on Image Classification
  Tasks
Effective Evaluation of Deep Active Learning on Image Classification Tasks
Nathan Beck
D. Sivasubramanian
Apurva Dani
Ganesh Ramakrishnan
Rishabh K. Iyer
VLM
20
38
0
16 Jun 2021
On the Validity of Modeling SGD with Stochastic Differential Equations
  (SDEs)
On the Validity of Modeling SGD with Stochastic Differential Equations (SDEs)
Zhiyuan Li
Sadhika Malladi
Sanjeev Arora
44
78
0
24 Feb 2021
The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous
  Neural Networks
The Implicit Bias for Adaptive Optimization Algorithms on Homogeneous Neural Networks
Bohan Wang
Qi Meng
Wei Chen
Tie-Yan Liu
30
33
0
11 Dec 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
310
2,896
0
15 Sep 2016
1