ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01838
  4. Cited By
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

6 November 2016
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
    ODL
ArXivPDFHTML

Papers citing "Entropy-SGD: Biasing Gradient Descent Into Wide Valleys"

50 / 129 papers shown
Title
Entropy-Guided Sampling of Flat Modes in Discrete Spaces
Entropy-Guided Sampling of Flat Modes in Discrete Spaces
Pinaki Mohanty
Riddhiman Bhattacharya
Ruqi Zhang
128
0
0
05 May 2025
Param$Δ$ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
ParamΔΔΔ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Sheng Cao
Mingrui Wu
Karthik Prasad
Yuandong Tian
Zechun Liu
MoMe
80
0
0
23 Apr 2025
Sharpness-Aware Parameter Selection for Machine Unlearning
Sharpness-Aware Parameter Selection for Machine Unlearning
Saber Malekmohammadi
Hong kyu Lee
Li Xiong
MU
145
0
0
08 Apr 2025
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Layer-wise Adaptive Gradient Norm Penalizing Method for Efficient and Accurate Deep Learning
Sunwoo Lee
98
0
0
18 Mar 2025
Early Stopping Against Label Noise Without Validation Data
Early Stopping Against Label Noise Without Validation Data
Suqin Yuan
Lei Feng
Tongliang Liu
NoLa
96
14
0
11 Feb 2025
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Evidence on the Regularisation Properties of Maximum-Entropy Reinforcement Learning
Rémy Hosseinkhan Boucher
Onofrio Semeraro
L. Mathelin
74
0
0
28 Jan 2025
FOCUS: First Order Concentrated Updating Scheme
FOCUS: First Order Concentrated Updating Scheme
Yizhou Liu
Ziming Liu
Jeff Gore
ODL
108
1
0
21 Jan 2025
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Seeking Consistent Flat Minima for Better Domain Generalization via Refining Loss Landscapes
Aodi Li
Liansheng Zhuang
Xiao Long
Minghong Yao
Shafei Wang
180
0
0
18 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks
Jim Zhao
Sidak Pal Singh
Aurélien Lucchi
AI4CE
39
0
0
04 Nov 2024
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Generalization Measures for Zero-Shot Cross-Lingual Transfer
Saksham Bassi
Duygu Ataman
Kyunghyun Cho
29
0
0
24 Apr 2024
Revisiting Confidence Estimation: Towards Reliable Failure Prediction
Revisiting Confidence Estimation: Towards Reliable Failure Prediction
Fei Zhu
Xu-Yao Zhang
Zhen Cheng
Cheng-Lin Liu
UQCV
46
10
0
05 Mar 2024
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Momentum-SAM: Sharpness Aware Minimization without Computational Overhead
Marlon Becker
Frederick Altrock
Benjamin Risse
74
5
0
22 Jan 2024
Critical Influence of Overparameterization on Sharpness-aware Minimization
Critical Influence of Overparameterization on Sharpness-aware Minimization
Sungbin Shin
Dongyeop Lee
Maksym Andriushchenko
Namhoon Lee
AAML
44
1
0
29 Nov 2023
FedSoup: Improving Generalization and Personalization in Federated
  Learning via Selective Model Interpolation
FedSoup: Improving Generalization and Personalization in Federated Learning via Selective Model Interpolation
Minghui Chen
Meirui Jiang
Qianming Dou
Zehua Wang
Xiaoxiao Li
FedML
30
15
0
20 Jul 2023
Multiplicative update rules for accelerating deep learning training and
  increasing robustness
Multiplicative update rules for accelerating deep learning training and increasing robustness
Manos Kirtas
Nikolaos Passalis
Anastasios Tefas
AAML
OOD
32
2
0
14 Jul 2023
How to escape sharp minima with random perturbations
How to escape sharp minima with random perturbations
Kwangjun Ahn
Ali Jadbabaie
S. Sra
ODL
26
6
0
25 May 2023
GeNAS: Neural Architecture Search with Better Generalization
GeNAS: Neural Architecture Search with Better Generalization
Joonhyun Jeong
Joonsang Yu
Geondo Park
Dongyoon Han
Y. Yoo
25
4
0
15 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer
  neural networks
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks
Raffaele Marino
F. Ricci-Tersenghi
27
14
0
10 May 2023
An Adaptive Policy to Employ Sharpness-Aware Minimization
An Adaptive Policy to Employ Sharpness-Aware Minimization
Weisen Jiang
Hansi Yang
Yu Zhang
James T. Kwok
AAML
81
31
0
28 Apr 2023
Robust Generalization against Photon-Limited Corruptions via Worst-Case
  Sharpness Minimization
Robust Generalization against Photon-Limited Corruptions via Worst-Case Sharpness Minimization
Zhuo Huang
Miaoxi Zhu
Xiaobo Xia
Li Shen
Jun Yu
Chen Gong
Bo Han
Bo Du
Tongliang Liu
32
31
0
23 Mar 2023
Randomized Adversarial Training via Taylor Expansion
Randomized Adversarial Training via Taylor Expansion
Gao Jin
Xinping Yi
Dengyu Wu
Ronghui Mu
Xiaowei Huang
AAML
36
34
0
19 Mar 2023
Rethinking Confidence Calibration for Failure Prediction
Rethinking Confidence Calibration for Failure Prediction
Fei Zhu
Zhen Cheng
Xu-Yao Zhang
Cheng-Lin Liu
UQCV
14
39
0
06 Mar 2023
Average of Pruning: Improving Performance and Stability of
  Out-of-Distribution Detection
Average of Pruning: Improving Performance and Stability of Out-of-Distribution Detection
Zhen Cheng
Fei Zhu
Xu-Yao Zhang
Cheng-Lin Liu
MoMe
OODD
40
11
0
02 Mar 2023
ASP: Learn a Universal Neural Solver!
ASP: Learn a Universal Neural Solver!
Chenguang Wang
Zhouliang Yu
Stephen Marcus McAleer
Tianshu Yu
Yao-Chun Yang
AAML
32
23
0
01 Mar 2023
SAM operates far from home: eigenvalue regularization as a dynamical
  phenomenon
SAM operates far from home: eigenvalue regularization as a dynamical phenomenon
Atish Agarwala
Yann N. Dauphin
19
20
0
17 Feb 2023
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
The Geometry of Neural Nets' Parameter Spaces Under Reparametrization
Agustinus Kristiadi
Felix Dangel
Philipp Hennig
22
11
0
14 Feb 2023
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Dissecting the Effects of SGD Noise in Distinct Regimes of Deep Learning
Antonio Sclocchi
Mario Geiger
M. Wyart
32
6
0
31 Jan 2023
Stability Analysis of Sharpness-Aware Minimization
Stability Analysis of Sharpness-Aware Minimization
Hoki Kim
Jinseong Park
Yujin Choi
Jaewook Lee
28
12
0
16 Jan 2023
Escaping Saddle Points for Effective Generalization on Class-Imbalanced
  Data
Escaping Saddle Points for Effective Generalization on Class-Imbalanced Data
Harsh Rangwani
Sumukh K Aithal
Mayank Mishra
R. Venkatesh Babu
31
27
0
28 Dec 2022
KL Regularized Normalization Framework for Low Resource Tasks
KL Regularized Normalization Framework for Low Resource Tasks
Neeraj Kumar
Ankur Narang
Brejesh Lall
21
1
0
21 Dec 2022
PAC-Bayes Compression Bounds So Tight That They Can Explain
  Generalization
PAC-Bayes Compression Bounds So Tight That They Can Explain Generalization
Sanae Lotfi
Marc Finzi
Sanyam Kapoor
Andres Potapczynski
Micah Goldblum
A. Wilson
BDL
MLT
AI4CE
21
51
0
24 Nov 2022
Symmetries, flat minima, and the conserved quantities of gradient flow
Symmetries, flat minima, and the conserved quantities of gradient flow
Bo-Lu Zhao
I. Ganev
Robin G. Walters
Rose Yu
Nima Dehmamy
44
16
0
31 Oct 2022
Rethinking Sharpness-Aware Minimization as Variational Inference
Rethinking Sharpness-Aware Minimization as Variational Inference
Szilvia Ujváry
Zsigmond Telek
A. Kerekes
Anna Mészáros
Ferenc Huszár
25
8
0
19 Oct 2022
Pareto Manifold Learning: Tackling multiple tasks via ensembles of
  single-task models
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models
Nikolaos Dimitriadis
P. Frossard
Franccois Fleuret
16
25
0
18 Oct 2022
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
ROSE: Robust Selective Fine-tuning for Pre-trained Language Models
Lan Jiang
Hao Zhou
Yankai Lin
Peng Li
Jie Zhou
R. Jiang
AAML
27
8
0
18 Oct 2022
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines
  and Drifting Towards Wide Minima
The Dynamics of Sharpness-Aware Minimization: Bouncing Across Ravines and Drifting Towards Wide Minima
Peter L. Bartlett
Philip M. Long
Olivier Bousquet
68
34
0
04 Oct 2022
Learning Gradient-based Mixup towards Flatter Minima for Domain
  Generalization
Learning Gradient-based Mixup towards Flatter Minima for Domain Generalization
Danni Peng
Sinno Jialin Pan
29
2
0
29 Sep 2022
A Closer Look at Learned Optimization: Stability, Robustness, and
  Inductive Biases
A Closer Look at Learned Optimization: Stability, Robustness, and Inductive Biases
James Harrison
Luke Metz
Jascha Narain Sohl-Dickstein
44
22
0
22 Sep 2022
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Learning Symbolic Model-Agnostic Loss Functions via Meta-Learning
Christian Raymond
Qi Chen
Bing Xue
Mengjie Zhang
FedML
27
11
0
19 Sep 2022
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot
  Classification
FS-BAN: Born-Again Networks for Domain Generalization Few-Shot Classification
Yunqing Zhao
Ngai-man Cheung
BDL
21
12
0
23 Aug 2022
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Sparse Double Descent: Where Network Pruning Aggravates Overfitting
Zhengqi He
Zeke Xie
Quanzhi Zhu
Zengchang Qin
67
27
0
17 Jun 2022
Generalized Federated Learning via Sharpness Aware Minimization
Generalized Federated Learning via Sharpness Aware Minimization
Zhe Qu
Xingyu Li
Rui Duan
Yaojiang Liu
Bo Tang
Zhuo Lu
FedML
20
130
0
06 Jun 2022
Information-Theoretic Odometry Learning
Information-Theoretic Odometry Learning
Sen Zhang
Jing Zhang
Dacheng Tao
15
5
0
11 Mar 2022
Tackling benign nonconvexity with smoothing and stochastic gradients
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
18
8
0
18 Feb 2022
Approximate Nearest Neighbor Search under Neural Similarity Metric for
  Large-Scale Recommendation
Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation
Rihan Chen
Bin Liu
Han Zhu
Yao Wang
Qi Li
...
Q. hua
Junliang Jiang
Yunlong Xu
Hongbo Deng
Bo Zheng
23
20
0
14 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in
  Deep Learning
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
30
116
0
08 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurélien Lucchi
53
44
0
06 Feb 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
11
58
0
01 Feb 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
Class-Incremental Continual Learning into the eXtended DER-verse
Class-Incremental Continual Learning into the eXtended DER-verse
Matteo Boschini
Lorenzo Bonicelli
Pietro Buzzega
Angelo Porrello
Simone Calderara
CLL
BDL
21
128
0
03 Jan 2022
123
Next