Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.08529
Cited By
Sharpness-Aware Minimization Improves Language Model Generalization
16 October 2021
Dara Bahri
H. Mobahi
Yi Tay
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sharpness-Aware Minimization Improves Language Model Generalization"
36 / 86 papers shown
Title
The Crucial Role of Normalization in Sharpness-Aware Minimization
Yan Dai
Kwangjun Ahn
S. Sra
21
17
0
24 May 2023
Biomedical Named Entity Recognition via Dictionary-based Synonym Generalization
Z. Fu
Yixuan Su
Zaiqiao Meng
Nigel Collier
MedIm
27
3
0
22 May 2023
Sharpness & Shift-Aware Self-Supervised Learning
Ngoc N. Tran
S. Duong
Hoang Phan
Tung Pham
Dinh Q. Phung
Trung Le
SSL
31
1
0
17 May 2023
Sharpness-Aware Minimization Alone can Improve Adversarial Robustness
Zeming Wei
Jingyu Zhu
Yihao Zhang
AAML
30
10
0
09 May 2023
DiTTO: A Feature Representation Imitation Approach for Improving Cross-Lingual Transfer
Shanu Kumar
Abbaraju Soujanya
Sandipan Dandapat
Sunayana Sitaram
Monojit Choudhury
VLM
27
1
0
04 Mar 2023
On Statistical Properties of Sharpness-Aware Minimization: Provable Guarantees
Kayhan Behdin
Rahul Mazumder
38
6
0
23 Feb 2023
mSAM: Micro-Batch-Averaged Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
S. Keerthi
Ayan Acharya
Borja Ocejo
Gregory Dexter
Rajiv Khanna
D. Durfee
Rahul Mazumder
AAML
18
7
0
19 Feb 2023
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
21
20
0
17 Feb 2023
Flat Seeking Bayesian Neural Networks
Van-Anh Nguyen
L. Vuong
Hoang Phan
Thanh-Toan Do
Dinh Q. Phung
Trung Le
BDL
27
8
0
06 Feb 2023
An SDE for Modeling SAM: Theory and Insights
Enea Monzio Compagnoni
Luca Biggio
Antonio Orvieto
F. Proske
Hans Kersting
Aurelien Lucchi
23
13
0
19 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
30
84
0
28 Dec 2022
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
28
0
28 Dec 2022
DSI++: Updating Transformer Memory with New Documents
Sanket Vaibhav Mehta
Jai Gupta
Yi Tay
Mostafa Dehghani
Vinh Q. Tran
J. Rao
Marc Najork
Emma Strubell
Donald Metzler
CLL
32
39
0
19 Dec 2022
Improving Generalization of Pre-trained Language Models via Stochastic Weight Averaging
Peng Lu
I. Kobyzev
Mehdi Rezagholizadeh
Ahmad Rashid
A. Ghodsi
Philippe Langlais
MoMe
35
11
0
12 Dec 2022
Improved Deep Neural Network Generalization Using m-Sharpness-Aware Minimization
Kayhan Behdin
Qingquan Song
Aman Gupta
D. Durfee
Ayan Acharya
S. Keerthi
Rahul Mazumder
AAML
31
5
0
07 Dec 2022
Improving Multi-task Learning via Seeking Task-based Flat Regions
Hoang Phan
Lam C. Tran
Ngoc N. Tran
Nhat Ho
Dinh Q. Phung
Trung Le
25
11
0
24 Nov 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
40
49
0
25 Oct 2022
K-SAM: Sharpness-Aware Minimization at the Speed of SGD
Renkun Ni
Ping Yeh-Chiang
Jonas Geiping
Micah Goldblum
A. Wilson
Tom Goldstein
26
8
0
23 Oct 2022
Can Language Representation Models Think in Bets?
Zhi–Bin Tang
Mayank Kejriwal
15
6
0
14 Oct 2022
Improving Sharpness-Aware Minimization with Fisher Mask for Better Generalization on Language Models
Qihuang Zhong
Liang Ding
Li Shen
Peng Mi
Juhua Liu
Bo Du
Dacheng Tao
AAML
30
50
0
11 Oct 2022
State-of-the-art generalisation research in NLP: A taxonomy and review
Dieuwke Hupkes
Mario Giulianelli
Verna Dankers
Mikel Artetxe
Yanai Elazar
...
Leila Khalatbari
Maria Ryskina
Rita Frieske
Ryan Cotterell
Zhijing Jin
119
93
0
06 Oct 2022
SAM as an Optimal Relaxation of Bayes
Thomas Möllenhoff
Mohammad Emtiyaz Khan
BDL
37
32
0
04 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
76
34
0
04 Oct 2022
Towards Bridging the Performance Gaps of Joint Energy-based Models
Xiulong Yang
Qing Su
Shihao Ji
VLM
13
12
0
16 Sep 2022
Model Generalization: A Sharpness Aware Optimization Perspective
Jozef Marus Coldenhoff
Chengkun Li
Yurui Zhu
13
2
0
14 Aug 2022
Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning
Momin Abbas
Quan-Wu Xiao
Lisha Chen
Pin-Yu Chen
Tianyi Chen
21
78
0
08 Jun 2022
Train Flat, Then Compress: Sharpness-Aware Minimization Learns More Compressible Models
Clara Na
Sanket Vaibhav Mehta
Emma Strubell
64
19
0
25 May 2022
Improving Generalization in Federated Learning by Seeking Flat Minima
Debora Caldarola
Barbara Caputo
Marco Ciccone
FedML
29
110
0
22 Mar 2022
Randomized Sharpness-Aware Training for Boosting Computational Efficiency in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
16
9
0
18 Mar 2022
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
24
58
0
01 Feb 2022
Sharpness-Aware Minimization with Dynamic Reweighting
Wenxuan Zhou
Fangyu Liu
Huan Zhang
Muhao Chen
AAML
19
8
0
16 Dec 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
274
2,603
0
04 May 2021
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
FreeLB: Enhanced Adversarial Training for Natural Language Understanding
Chen Zhu
Yu Cheng
Zhe Gan
S. Sun
Tom Goldstein
Jingjing Liu
AAML
232
438
0
25 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
297
6,959
0
20 Apr 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
Previous
1
2