Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.03108
Cited By
Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization
3 March 2023
Xingxuan Zhang
Renzhe Xu
Han Yu
Hao Zou
Peng Cui
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gradient Norm Aware Minimization Seeks First-Order Flatness and Improves Generalization"
14 / 14 papers shown
Title
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Sunwoo Lee
112
0
0
18 Mar 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
4
0
10 Feb 2025
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
216
0
0
18 Dec 2024
Meta Curvature-Aware Minimization for Domain Generalization
Zhaoyu Chen
Yiwen Ye
Feilong Tang
Yongsheng Pan
Yong-quan Xia
BDL
218
1
0
16 Dec 2024
Deep Companion Learning: Enhancing Generalization Through Historical Consistency
Ruizhao Zhu
Venkatesh Saligrama
FedML
40
0
0
26 Jul 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
48
1
0
24 Feb 2024
Continual Learning through Networks Splitting and Merging with Dreaming-Meta-Weighted Model Fusion
Yi Sun
Xin Xu
Jian Li
Guanglei Xie
Yifei Shi
Qiang Fang
CLL
MoMe
34
1
0
12 Dec 2023
Practical Sharpness-Aware Minimization Cannot Converge All the Way to Optima
Dongkuk Si
Chulhee Yun
28
15
0
16 Jun 2023
Efficient Sharpness-aware Minimization for Improved Training of Neural Networks
Jiawei Du
Hanshu Yan
Jiashi Feng
Qiufeng Wang
Liangli Zhen
Rick Siow Mong Goh
Vincent Y. F. Tan
AAML
113
132
0
07 Oct 2021
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,225
0
16 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,890
0
15 Sep 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,217
0
01 Sep 2014
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
266
7,638
0
03 Jul 2012
1