ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01838
  4. Cited By
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

6 November 2016
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
    ODL
ArXivPDFHTML

Papers citing "Entropy-SGD: Biasing Gradient Descent Into Wide Valleys"

50 / 163 papers shown
Title
Entropy-Guided Sampling of Flat Modes in Discrete Spaces
Entropy-Guided Sampling of Flat Modes in Discrete Spaces
Pinaki Mohanty
Riddhiman Bhattacharya
Ruqi Zhang
146
0
0
05 May 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
80
0
0
23 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning
Sharpness-Aware Parameter Selection for Machine Unlearning
Saber Malekmohammadi
Hong kyu Lee
Li Xiong
MU
160
0
0
08 Apr 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Sunwoo Lee
112
0
0
18 Mar 2025
Early Stopping Against Label Noise Without Validation Data
Early Stopping Against Label Noise Without Validation Data
Suqin Yuan
Lei Feng
Tongliang Liu
NoLa
101
15
0
11 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
82
0
0
28 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
186
0
0
18 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurelien Lucchi
AI4CE
45
0
0
04 Nov 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
29
0
0
24 Apr 2024
Revisiting Confidence Estimation: Towards Reliable Failure Prediction
Revisiting Confidence Estimation: Towards Reliable Failure Prediction
Fei Zhu
Xu-Yao Zhang
Zhen Cheng
Cheng-Lin Liu
UQCV
49
10
0
05 Mar 2024
Effective Gradient Sample Size via Variation Estimation for Accelerating
  Sharpness aware Minimization
Effective Gradient Sample Size via Variation Estimation for Accelerating Sharpness aware Minimization
Jiaxin Deng
Junbiao Pang
Baochang Zhang
Tian Wang
45
1
0
24 Feb 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
79
5
0
22 Jan 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
FedSoup: Improving Generalization and Personalization in Federated
  Learning via Selective Model Interpolation
FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation
Minghui Chen
Meirui Jiang
Qianming Dou
Zehua Wang
Xiaoxiao Li
FedML
35
15
0
20 Jul 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
36
2
0
14 Jul 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
32
6
0
25 May 2023
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude
  Pruning
Sparse Weight Averaging with Multiple Particles for Iterative Magnitude Pruning
Moonseok Choi
Hyungi Lee
G. Nam
Juho Lee
34
2
0
24 May 2023
Improving Convergence and Generalization Using Parameter Symmetries
Improving Convergence and Generalization Using Parameter Symmetries
Bo-Lu Zhao
Robert Mansel Gower
Robin G. Walters
Rose Yu
MoMe
33
13
0
22 May 2023
GeNAS: Neural Architecture Search with Better Generalization
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
30
4
0
15 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer
  neural networks
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
30
14
0
10 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization
An Adaptive Policy to Employ Sharpness-Aware Minimization
Weisen Jiang
Hansi Yang
Yu Zhang
James T. Kwok
AAML
83
31
0
28 Apr 2023
Robust Generalization against Photon-Limited Corruptions via Worst-Case
  Sharpness Minimization
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Zhuo Huang
Miaoxi Zhu
Xiaobo Xia
Li Shen
Jun Yu
Chen Gong
Bo Han
Bo Du
Tongliang Liu
32
31
0
23 Mar 2023
Randomized Adversarial Training via Taylor Expansion
Randomized Adversarial Training via Taylor Expansion
Gao Jin
Xinping Yi
Dengyu Wu
Ronghui Mu
Xiaowei Huang
AAML
44
34
0
19 Mar 2023
Rethinking Confidence Calibration for Failure Prediction
Rethinking Confidence Calibration for Failure Prediction
Fei Zhu
Zhen Cheng
Xu-Yao Zhang
Cheng-Lin Liu
UQCV
22
39
0
06 Mar 2023
Average of Pruning: Improving Performance and Stability of
  Out-of-Distribution Detection
Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection
Zhen Cheng
Fei Zhu
Xu-Yao Zhang
Cheng-Lin Liu
MoMe
OODD
40
11
0
02 Mar 2023
ASP: Learn a Universal Neural Solver!
ASP: Learn a Universal Neural Solver!
Chenguang Wang
Zhouliang Yu
Stephen Marcus McAleer
Tianshu Yu
Yao-Chun Yang
AAML
32
24
0
01 Mar 2023
SAM operates far from home: eigenvalue regularization as a dynamical
  phenomenon
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
21
20
0
17 Feb 2023
The autoregressive neural network architecture of the Boltzmann
  distribution of pairwise interacting spins systems
The autoregressive neural network architecture of the Boltzmann distribution of pairwise interacting spins systems
I. Biazzo
AI4CE
28
7
0
16 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Agustinus Kristiadi
Felix Dangel
Philipp Hennig
32
11
0
14 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
40
6
0
31 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
36
12
0
16 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
28
0
28 Dec 2022
KL Regularized Normalization Framework for Low Resource Tasks
KL Regularized Normalization Framework for Low Resource Tasks
Neeraj Kumar
Ankur Narang
Brejesh Lall
26
1
0
21 Dec 2022
Cross-Domain Ensemble Distillation for Domain Generalization
Cross-Domain Ensemble Distillation for Domain Generalization
Kyung-Jin Lee
Sungyeon Kim
Suha Kwak
FedML
OOD
26
38
0
25 Nov 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
29
51
0
24 Nov 2022
Non-reversible Parallel Tempering for Deep Posterior Approximation
Non-reversible Parallel Tempering for Deep Posterior Approximation
Wei Deng
Qian Zhang
Qi Feng
F. Liang
Guang Lin
23
4
0
20 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo-Lu Zhao
I. Ganev
Robin G. Walters
Rose Yu
Nima Dehmamy
47
16
0
31 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference
Rethinking Sharpness-Aware Minimization as Variational Inference
Szilvia Ujváry
Zsigmond Telek
A. Kerekes
Anna Mészáros
Ferenc Huszár
33
8
0
19 Oct 2022
Pareto Manifold Learning: Tackling multiple tasks via ensembles of
  single-task models
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models
Nikolaos Dimitriadis
P. Frossard
Franccois Fleuret
26
25
0
18 Oct 2022
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
Lan Jiang
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
R. Jiang
AAML
37
8
0
18 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
  and Drifting Towards Wide Minima
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
76
34
0
04 Oct 2022
Learning Gradient-based Mixup towards Flatter Minima for Domain
  Generalization
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization
Danni Peng
Sinno Jialin Pan
34
2
0
29 Sep 2022
A Closer Look at Learned Optimization: Stability, Robustness, and
  Inductive Biases
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
47
22
0
22 Sep 2022
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Christian Raymond
Qi Chen
Bing Xue
Mengjie Zhang
FedML
29
11
0
19 Sep 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot
  Classification
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-man Cheung
BDL
23
12
0
23 Aug 2022
On Leave-One-Out Conditional Mutual Information For Generalization
On Leave-One-Out Conditional Mutual Information For Generalization
Mohamad Rida Rammal
Alessandro Achille
Aditya Golatkar
Suhas Diggavi
Stefano Soatto
VLM
28
5
0
01 Jul 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
74
27
0
17 Jun 2022
Generalized Federated Learning via Sharpness Aware Minimization
Generalized Federated Learning via Sharpness Aware Minimization
Zhe Qu
Xingyu Li
Rui Duan
Yaojiang Liu
Bo Tang
Zhuo Lu
FedML
31
131
0
06 Jun 2022
Information-Theoretic Odometry Learning
Information-Theoretic Odometry Learning
Sen Zhang
Jing Zhang
Dacheng Tao
15
5
0
11 Mar 2022
1234
Next