ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.08529
  4. Cited By
Sharpness-Aware Minimization Improves Language Model Generalization

Sharpness-Aware Minimization Improves Language Model Generalization

16 October 2021
Dara Bahri
H. Mobahi
Yi Tay
ArXivPDFHTML

Papers citing "Sharpness-Aware Minimization Improves Language Model Generalization"

50 / 86 papers shown
Title
Improving Generalization of Medical Image Registration Foundation Model
Improving Generalization of Medical Image Registration Foundation Model
Jing Hu
Kaiwei Yu
Hongjiang Xian
Shu Hu
Xin Wang
MedIm
34
0
0
10 May 2025
Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization
Unified Enhancement of the Generalization and Robustness of Language Models via Bi-Stage Optimization
Yizhou Sun
Juan Yin
Juan Zhao
Fan Zhang
Yongheng Liu
Hongji Chen
37
0
0
19 Mar 2025
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation
Dahun Shin
Dongyeop Lee
Jinseok Chung
Namhoon Lee
ODL
AAML
177
0
0
25 Feb 2025
Understanding Silent Data Corruption in LLM Training
Understanding Silent Data Corruption in LLM Training
Jeffrey Ma
Hengzhi Pei
Leonard Lausen
George Karypis
37
0
0
17 Feb 2025
Elucidating the Design Space of Dataset Condensation
Elucidating the Design Space of Dataset Condensation
Shitong Shao
Zikai Zhou
Huanran Chen
Zhiqiang Shen
DD
54
7
0
20 Jan 2025
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
186
0
0
18 Dec 2024
Implicit Regularization of Sharpness-Aware Minimization for
  Scale-Invariant Problems
Implicit Regularization of Sharpness-Aware Minimization for Scale-Invariant Problems
Bingcong Li
Liang Zhang
Niao He
43
3
0
18 Oct 2024
Sharpness-Aware Black-Box Optimization
Sharpness-Aware Black-Box Optimization
Feiyang Ye
Yueming Lyu
Xuehao Wang
Masashi Sugiyama
Yu-Jie Zhang
Ivor W. Tsang
AAML
47
0
0
16 Oct 2024
Model Balancing Helps Low-data Training and Fine-tuning
Model Balancing Helps Low-data Training and Fine-tuning
Zihang Liu
Yihan Hu
Tianyu Pang
Yefan Zhou
Pu Ren
Yaoqing Yang
36
2
0
16 Oct 2024
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Combinatorial Multi-armed Bandits: Arm Selection via Group Testing
Arpan Mukherjee
Shashanka Ubaru
K. Murugesan
Karthikeyan Shanmugam
A. Tajer
41
0
0
14 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
62
0
0
14 Oct 2024
Improving Generalization with Flat Hilbert Bayesian Inference
Improving Generalization with Flat Hilbert Bayesian Inference
Tuan Truong
Quyen Tran
Quan Pham-Ngoc
Nhat Ho
Dinh Q. Phung
Trung Le
26
0
0
05 Oct 2024
COSCO: A Sharpness-Aware Training Framework for Few-shot Multivariate
  Time Series Classification
COSCO: A Sharpness-Aware Training Framework for Few-shot Multivariate Time Series Classification
Jesus Barreda
Ashley Gomez
Ruben Puga
Kaixiong Zhou
Li Zhang
AI4TS
18
0
0
15 Sep 2024
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Enhancing Sharpness-Aware Minimization by Learning Perturbation Radius
Xuehao Wang
Weisen Jiang
Shuai Fu
Yu Zhang
AAML
44
0
0
15 Aug 2024
Do Sharpness-based Optimizers Improve Generalization in Medical Image
  Analysis?
Do Sharpness-based Optimizers Improve Generalization in Medical Image Analysis?
Mohamed Hassan
Aleksandar Vakanski
Min Xian
AAML
MedIm
41
1
0
07 Aug 2024
Improving SAM Requires Rethinking its Optimization Formulation
Improving SAM Requires Rethinking its Optimization Formulation
Wanyun Xie
Fabian Latorre
Kimon Antonakopoulos
Thomas Pethick
V. Cevher
42
1
0
17 Jul 2024
Efficient Sharpness-Aware Minimization for Molecular Graph Transformer
  Models
Efficient Sharpness-Aware Minimization for Molecular Graph Transformer Models
Yili Wang
Kaixiong Zhou
Ninghao Liu
Ying Wang
Xin Wang
40
10
0
19 Jun 2024
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Asymptotic Unbiased Sample Sampling to Speed Up Sharpness-Aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
66
1
0
12 Jun 2024
Agnostic Sharpness-Aware Minimization
Agnostic Sharpness-Aware Minimization
Van-Anh Nguyen
Quyen Tran
Tuan Truong
Thanh-Toan Do
Dinh Q. Phung
Trung Le
46
0
0
11 Jun 2024
Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM
  Dynamics
Forget Sharpness: Perturbed Forgetting of Model Biases Within SAM Dynamics
Ankit Vani
Frederick Tung
Gabriel L. Oliveira
Hossein Sharifi-Noghabi
AAML
35
0
0
10 Jun 2024
A Universal Class of Sharpness-Aware Minimization Algorithms
A Universal Class of Sharpness-Aware Minimization Algorithms
B. Tahmasebi
Ashkan Soleymani
Dara Bahri
Stefanie Jegelka
P. Jaillet
AAML
52
2
0
06 Jun 2024
Sharpness-Aware Minimization Enhances Feature Quality via Balanced
  Learning
Sharpness-Aware Minimization Enhances Feature Quality via Balanced Learning
Jacob Mitchell Springer
Vaishnavh Nagarajan
Aditi Raghunathan
44
5
0
30 May 2024
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
A Hybrid Generative and Discriminative PointNet on Unordered Point Sets
Yang Ye
Shihao Ji
PINN
3DPC
41
0
0
19 Apr 2024
Flatness Improves Backbone Generalisation in Few-shot Classification
Flatness Improves Backbone Generalisation in Few-shot Classification
Rui Li
Martin Trapp
Marcus Klasson
Arno Solin
45
0
0
11 Apr 2024
From Robustness to Improved Generalization and Calibration in
  Pre-trained Language Models
From Robustness to Improved Generalization and Calibration in Pre-trained Language Models
Josip Jukić
Jan Snajder
37
0
0
31 Mar 2024
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Diversity-Aware Agnostic Ensemble of Sharpness Minimizers
Anh-Vu Bui
Vy Vo
Tung Pham
Dinh Q. Phung
Trung Le
FedML
UQCV
37
1
0
19 Mar 2024
On the Duality Between Sharpness-Aware Minimization and Adversarial
  Training
On the Duality Between Sharpness-Aware Minimization and Adversarial Training
Yihao Zhang
Hangzhou He
Jingyu Zhu
Huanran Chen
Yifei Wang
Zeming Wei
AAML
34
9
0
23 Feb 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
79
5
0
22 Jan 2024
Stabilizing Sharpness-aware Minimization Through A Simple
  Renormalization Strategy
Stabilizing Sharpness-aware Minimization Through A Simple Renormalization Strategy
Chengli Tan
Jiangshe Zhang
Junmin Liu
Yicheng Wang
Yunda Hao
AAML
34
1
0
14 Jan 2024
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
CR-SAM: Curvature Regularized Sharpness-Aware Minimization
Tao Wu
Tie Luo
D. C. Wunsch
16
3
0
21 Dec 2023
RoAST: Robustifying Language Models via Adversarial Perturbation with
  Selective Training
RoAST: Robustifying Language Models via Adversarial Perturbation with Selective Training
Jaehyung Kim
Yuning Mao
Rui Hou
Hanchao Yu
Davis Liang
Pascale Fung
Qifan Wang
Fuli Feng
Lifu Huang
Madian Khabsa
AAML
23
2
0
07 Dec 2023
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
Robust Contrastive Learning With Theory Guarantee
Robust Contrastive Learning With Theory Guarantee
Ngoc N. Tran
Lam C. Tran
Hoang Phan
Anh-Vu Bui
Tung Pham
Toan M. Tran
Dinh Q. Phung
Trung Le
SSL
NoLa
26
0
0
16 Nov 2023
Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Why Does Sharpness-Aware Minimization Generalize Better Than SGD?
Zixiang Chen
Junkai Zhang
Yiwen Kou
Xiangning Chen
Cho-Jui Hsieh
Quanquan Gu
32
13
0
11 Oct 2023
Entropy-MCMC: Sampling from Flat Basins with Ease
Entropy-MCMC: Sampling from Flat Basins with Ease
Bolian Li
Ruqi Zhang
25
5
0
09 Oct 2023
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
TRAM: Bridging Trust Regions and Sharpness Aware Minimization
Tom Sherborne
Naomi Saphra
Pradeep Dasigi
Hao Peng
32
4
0
05 Oct 2023
RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
RSAM: Learning on manifolds with Riemannian Sharpness-aware Minimization
Kenneth Allen
Hoang-Phi Nguyen
Tung Pham
Ming-Jun Lai
Mehrtash Harandi
Dinh Q. Phung
Trung Le
AAML
34
3
0
29 Sep 2023
Baby Llama: knowledge distillation from an ensemble of teachers trained
  on a small dataset with no performance penalty
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty
I. Timiryasov
J. Tastet
13
46
0
03 Aug 2023
Systematic Investigation of Sparse Perturbed Sharpness-Aware
  Minimization Optimizer
Systematic Investigation of Sparse Perturbed Sharpness-Aware Minimization Optimizer
Peng Mi
Li Shen
Tianhe Ren
Yiyi Zhou
Tianshuo Xu
Xiaoshuai Sun
Tongliang Liu
Rongrong Ji
Dacheng Tao
AAML
33
2
0
30 Jun 2023
PLASTIC: Improving Input and Label Plasticity for Sample Efficient
  Reinforcement Learning
PLASTIC: Improving Input and Label Plasticity for Sample Efficient Reinforcement Learning
Hojoon Lee
Hanseul Cho
Hyunseung Kim
Daehoon Gwak
Joonkee Kim
Jaegul Choo
Se-Young Yun
Chulhee Yun
OffRL
82
25
0
19 Jun 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to
  Optima
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
28
15
0
16 Jun 2023
The Split Matters: Flat Minima Methods for Improving the Performance of
  GNNs
The Split Matters: Flat Minima Methods for Improving the Performance of GNNs
N. Lell
A. Scherp
40
1
0
15 Jun 2023
Gradient Ascent Post-training Enhances Language Model Generalization
Gradient Ascent Post-training Enhances Language Model Generalization
Dongkeun Yoon
Joel Jang
Sungdong Kim
Minjoon Seo
VLM
AI4CE
21
3
0
12 Jun 2023
Differentially Private Sharpness-Aware Training
Differentially Private Sharpness-Aware Training
Jinseong Park
Hoki Kim
Yujin Choi
Jaewook Lee
27
8
0
09 Jun 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
27
18
0
07 Jun 2023
Optimal Transport Model Distributional Robustness
Optimal Transport Model Distributional Robustness
Van-Anh Nguyen
Trung Le
Anh Tuan Bui
Thanh-Toan Do
Dinh Q. Phung
OOD
30
3
0
07 Jun 2023
Decentralized SGD and Average-direction SAM are Asymptotically
  Equivalent
Decentralized SGD and Average-direction SAM are Asymptotically Equivalent
Tongtian Zhu
Fengxiang He
Kaixuan Chen
Mingli Song
Dacheng Tao
34
15
0
05 Jun 2023
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio
  Anti-spoofing
Multi-Dataset Co-Training with Sharpness-Aware Optimization for Audio Anti-spoofing
Hye-jin Shim
Jee-weon Jung
Tomi Kinnunen
21
13
0
31 May 2023
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a
  Regularization Term
Sharpness-Aware Minimization Revisited: Weighted Sharpness as a Regularization Term
Yun Yue
Jiadi Jiang
Zhiling Ye
Ni Gao
Yongchao Liu
Kecheng Zhang
MLAU
ODL
17
11
0
25 May 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
32
6
0
25 May 2023
12
Next