Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04226
Cited By
Normalization Layers Are All That Sharpness-Aware Minimization Needs
7 June 2023
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Normalization Layers Are All That Sharpness-Aware Minimization Needs"
20 / 20 papers shown
Title
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
ViT
OffRL
67
8
0
13 Mar 2025
Towards Understanding the Role of Sharpness-Aware Minimization Algorithms for Out-of-Distribution Generalization
Samuel Schapiro
Han Zhao
81
1
0
06 Dec 2024
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Arpan Mukherjee
Shashanka Ubaru
K. Murugesan
Karthikeyan Shanmugam
A. Tajer
41
0
0
14 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
Understanding Adversarially Robust Generalization via Weight-Curvature Index
Yuelin Xu
Xiao Zhang
AAML
34
0
0
10 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
63
1
0
26 Aug 2024
Flat Posterior Does Matter For Bayesian Model Averaging
Sungjun Lim
Jeyoon Yeom
Sooyon Kim
Hoyoon Byun
Jinho Kang
Yohan Jung
Jiyoung Jung
Kyungwoo Song
AAML
BDL
61
0
0
21 Jun 2024
Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics
Ankit Vani
Frederick Tung
Gabriel L. Oliveira
Hossein Sharifi-Noghabi
AAML
45
0
0
10 Jun 2024
Improving Generalization and Convergence by Enhancing Implicit Regularization
Mingze Wang
Haotian He
Jinbo Wang
Zilin Wang
Guanhua Huang
Zhiyu Li
Zhiyu Li
E. Weinan
Lei Wu
54
7
0
31 May 2024
Locally Estimated Global Perturbations are Better than Local Perturbations for Federated Sharpness-aware Minimization
Ziqing Fan
Shengchao Hu
Jiangchao Yao
Gang Niu
Ya Zhang
Masashi Sugiyama
Yanfeng Wang
FedML
49
11
0
29 May 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
47
11
0
23 Feb 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
82
5
0
22 Jan 2024
Layer-wise Linear Mode Connectivity
Linara Adilova
Maksym Andriushchenko
Michael Kamp
Asja Fischer
Martin Jaggi
FedML
FAtt
MoMe
43
15
0
13 Jul 2023
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
71
19
0
25 May 2022
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
133
98
0
16 Oct 2021
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du
Hanshu Yan
Jiashi Feng
Qiufeng Wang
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan
AAML
113
132
0
07 Oct 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
226
513
0
11 Feb 2021
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
348
10,237
0
16 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
312
2,896
0
15 Sep 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
353
36,437
0
25 Aug 2016
1