Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.02369
Cited By
v1
v2
v3 (latest)
Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks
5 May 2025
Juyoung Yun
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sharpness-Aware Minimization with Z-Score Gradient Filtering for Neural Networks"
19 / 19 papers shown
Title
Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization
Juyoung Yun
44
3
0
28 Oct 2024
Friendly Sharpness-Aware Minimization
Tao Li
Pan Zhou
Zhengbao He
Xinwen Cheng
Xiaolin Huang
AAML
84
17
0
19 Mar 2024
Robust Neural Pruning with Gradient Sampling Optimization for Residual Neural Networks
Juyoung Yun
61
1
0
26 Dec 2023
GA-SAM: Gradient-Strength based Adaptive Sharpness-Aware Minimization for Improved Generalization
Zhiyuan Zhang
Ruixuan Luo
Qi Su
Xueting Sun
105
13
0
13 Oct 2022
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
112
330
0
03 Jun 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
128
291
0
23 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
780
41,945
0
22 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
251
1,363
0
03 Oct 2020
Gradient Centralization: A New Optimization Technique for Deep Neural Networks
Hongwei Yong
Jianqiang Huang
Xiansheng Hua
Lei Zhang
ODL
100
188
0
03 Apr 2020
Benign Overfitting in Linear Regression
Peter L. Bartlett
Philip M. Long
Gábor Lugosi
Alexander Tsigler
MLT
135
780
0
26 Jun 2019
Exploring Generalization in Deep Learning
Behnam Neyshabur
Srinadh Bhojanapalli
David A. McAllester
Nathan Srebro
FAtt
205
1,261
0
27 Jun 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
986
133,586
0
12 Jun 2017
Understanding deep learning requires rethinking generalization
Chiyuan Zhang
Samy Bengio
Moritz Hardt
Benjamin Recht
Oriol Vinyals
HAI
374
4,641
0
10 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
568
2,948
0
15 Sep 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
450
10,570
0
21 Jul 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.7K
195,297
0
10 Dec 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
802
43,419
0
11 Feb 2015
Very Deep Convolutional Networks for Large-Scale Image Recognition
Karen Simonyan
Andrew Zisserman
FAtt
MDE
2.0K
100,832
0
04 Sep 2014
On the difficulty of training Recurrent Neural Networks
Razvan Pascanu
Tomas Mikolov
Yoshua Bengio
ODL
310
5,370
0
21 Nov 2012
1