ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01838
  4. Cited By
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

6 November 2016
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
    ODL
ArXivPDFHTML

Papers citing "Entropy-SGD: Biasing Gradient Descent Into Wide Valleys"

50 / 163 papers shown
Title
Tackling benign nonconvexity with smoothing and stochastic gradients
Tackling benign nonconvexity with smoothing and stochastic gradients
Harsh Vardhan
Sebastian U. Stich
26
8
0
18 Feb 2022
Approximate Nearest Neighbor Search under Neural Similarity Metric for
  Large-Scale Recommendation
Approximate Nearest Neighbor Search under Neural Similarity Metric for Large-Scale Recommendation
Rihan Chen
Bin Liu
Ziru Xu
Yao Wang
Qi Li
...
Q. hua
Junliang Jiang
Yunlong Xu
Hongbo Deng
Bo Zheng
34
21
0
14 Feb 2022
Penalizing Gradient Norm for Efficiently Improving Generalization in
  Deep Learning
Penalizing Gradient Norm for Efficiently Improving Generalization in Deep Learning
Yang Zhao
Hao Zhang
Xiuyuan Hu
32
116
0
08 Feb 2022
Anticorrelated Noise Injection for Improved Generalization
Anticorrelated Noise Injection for Improved Generalization
Antonio Orvieto
Hans Kersting
F. Proske
Francis R. Bach
Aurelien Lucchi
53
44
0
06 Feb 2022
When Do Flat Minima Optimizers Work?
When Do Flat Minima Optimizers Work?
Jean Kaddour
Linqing Liu
Ricardo M. A. Silva
Matt J. Kusner
ODL
21
58
0
01 Feb 2022
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning
  Optimization Landscape
Low-Pass Filtering SGD for Recovering Flat Optima in the Deep Learning Optimization Landscape
Devansh Bisla
Jing Wang
A. Choromańska
25
34
0
20 Jan 2022
Class-Incremental Continual Learning into the eXtended DER-verse
Class-Incremental Continual Learning into the eXtended DER-verse
Matteo Boschini
Lorenzo Bonicelli
Pietro Buzzega
Angelo Porrello
Simone Calderara
CLL
BDL
26
128
0
03 Jan 2022
Sharpness-aware Quantization for Deep Neural Networks
Sharpness-aware Quantization for Deep Neural Networks
Jing Liu
Jianfei Cai
Bohan Zhuang
MQ
27
24
0
24 Nov 2021
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Large-Scale Deep Learning Optimizations: A Comprehensive Survey
Xiaoxin He
Fuzhao Xue
Xiaozhe Ren
Yang You
27
14
0
01 Nov 2021
Does the Data Induce Capacity Control in Deep Learning?
Does the Data Induce Capacity Control in Deep Learning?
Rubing Yang
J. Mao
Pratik Chaudhari
33
15
0
27 Oct 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the
  Teacher
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
79
17
0
16 Oct 2021
Sharpness-Aware Minimization Improves Language Model Generalization
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
127
98
0
16 Oct 2021
Observations on K-image Expansion of Image-Mixing Augmentation for
  Classification
Observations on K-image Expansion of Image-Mixing Augmentation for Classification
Joonhyun Jeong
Sungmin Cha
Jongwon Choi
Sangdoo Yun
Taesup Moon
Y. Yoo
VLM
21
6
0
08 Oct 2021
On the Generalization of Models Trained with SGD: Information-Theoretic
  Bounds and Implications
On the Generalization of Models Trained with SGD: Information-Theoretic Bounds and Implications
Ziqiao Wang
Yongyi Mao
FedML
MLT
37
22
0
07 Oct 2021
Perturbated Gradients Updating within Unit Space for Deep Learning
Perturbated Gradients Updating within Unit Space for Deep Learning
Ching-Hsun Tseng
Liu Cheng
Shin-Jye Lee
Xiaojun Zeng
40
5
0
01 Oct 2021
An Expert System for Redesigning Software for Cloud Applications
An Expert System for Redesigning Software for Cloud Applications
Rahul Yedida
R. Krishna
A. Kalia
Tim Menzies
Jin Xiao
M. Vukovic
11
4
0
29 Sep 2021
Adversarial Parameter Defense by Multi-Step Risk Minimization
Adversarial Parameter Defense by Multi-Step Risk Minimization
Zhiyuan Zhang
Ruixuan Luo
Xuancheng Ren
Qi Su
Liangyou Li
Xu Sun
AAML
25
6
0
07 Sep 2021
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural
  Networks: A Tale of Symmetry II
Analytic Study of Families of Spurious Minima in Two-Layer ReLU Neural Networks: A Tale of Symmetry II
Yossi Arjevani
M. Field
28
18
0
21 Jul 2021
Minimum sharpness: Scale-invariant parameter-robustness of neural
  networks
Minimum sharpness: Scale-invariant parameter-robustness of neural networks
Hikaru Ibayashi
Takuo Hamaguchi
Masaaki Imaizumi
25
5
0
23 Jun 2021
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
RDA: Robust Domain Adaptation via Fourier Adversarial Attacking
Jiaxing Huang
Dayan Guan
Aoran Xiao
Shijian Lu
AAML
37
76
0
05 Jun 2021
Relating Adversarially Robust Generalization to Flat Minima
Relating Adversarially Robust Generalization to Flat Minima
David Stutz
Matthias Hein
Bernt Schiele
OOD
32
65
0
09 Apr 2021
Interpretable Machine Learning: Fundamental Principles and 10 Grand
  Challenges
Interpretable Machine Learning: Fundamental Principles and 10 Grand Challenges
Cynthia Rudin
Chaofan Chen
Zhi Chen
Haiyang Huang
Lesia Semenova
Chudi Zhong
FaML
AI4CE
LRM
59
653
0
20 Mar 2021
Siamese Labels Auxiliary Learning
Siamese Labels Auxiliary Learning
Wenrui Gan
Zhulin Liu
Cheng Chen
Tong Zhang
17
1
0
27 Feb 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
  of Deep Neural Networks
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
31
281
0
23 Feb 2021
Learning Neural Network Subspaces
Learning Neural Network Subspaces
Mitchell Wortsman
Maxwell Horton
Carlos Guestrin
Ali Farhadi
Mohammad Rastegari
UQCV
27
85
0
20 Feb 2021
Stability of SGD: Tightness Analysis and Improved Bounds
Stability of SGD: Tightness Analysis and Improved Bounds
Yikai Zhang
Wenjia Zhang
Sammy Bald
Vamsi Pingali
Chao Chen
Mayank Goswami
MLT
21
36
0
10 Feb 2021
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic
  Regression
Adversarial Training Makes Weight Loss Landscape Sharper in Logistic Regression
Masanori Yamada
Sekitoshi Kanai
Tomoharu Iwata
Tomokatsu Takahashi
Yuki Yamanaka
Hiroshi Takahashi
Atsutoshi Kumagai
AAML
13
9
0
05 Feb 2021
Combating Mode Collapse in GAN training: An Empirical Analysis using
  Hessian Eigenvalues
Combating Mode Collapse in GAN training: An Empirical Analysis using Hessian Eigenvalues
Ricard Durall
Avraam Chatzimichailidis
P. Labus
J. Keuper
GAN
25
57
0
17 Dec 2020
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and
  its Applications to Regularization
A Deeper Look at the Hessian Eigenspectrum of Deep Neural Networks and its Applications to Regularization
Adepu Ravi Sankar
Yash Khasbage
Rahul Vigneswaran
V. Balasubramanian
25
42
0
07 Dec 2020
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient
  Descent
Noise and Fluctuation of Finite Learning Rate Stochastic Gradient Descent
Kangqiao Liu
Liu Ziyin
Masakuni Ueda
MLT
61
37
0
07 Dec 2020
Fine-tuning BERT for Low-Resource Natural Language Understanding via
  Active Learning
Fine-tuning BERT for Low-Resource Natural Language Understanding via Active Learning
Daniel Grießhaber
J. Maucher
Ngoc Thang Vu
19
46
0
04 Dec 2020
Positive-Congruent Training: Towards Regression-Free Model Updates
Positive-Congruent Training: Towards Regression-Free Model Updates
Sijie Yan
Yuanjun Xiong
Kaustav Kundu
Shuo Yang
Siqi Deng
Meng Wang
Wei Xia
Stefano Soatto
BDL
16
49
0
18 Nov 2020
Chaos and Complexity from Quantum Neural Network: A study with Diffusion
  Metric in Machine Learning
Chaos and Complexity from Quantum Neural Network: A study with Diffusion Metric in Machine Learning
S. Choudhury
Ankan Dutta
Debisree Ray
22
21
0
16 Nov 2020
Underspecification Presents Challenges for Credibility in Modern Machine
  Learning
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
77
670
0
06 Nov 2020
Memorizing without overfitting: Bias, variance, and interpolation in
  over-parameterized models
Memorizing without overfitting: Bias, variance, and interpolation in over-parameterized models
J. Rocks
Pankaj Mehta
13
41
0
26 Oct 2020
Deep Learning is Singular, and That's Good
Deep Learning is Singular, and That's Good
Daniel Murfet
Susan Wei
Biwei Huang
Hui Li
Jesse Gell-Redman
T. Quella
UQCV
24
26
0
22 Oct 2020
Regularizing Neural Networks via Adversarial Model Perturbation
Regularizing Neural Networks via Adversarial Model Perturbation
Yaowei Zheng
Richong Zhang
Yongyi Mao
AAML
30
95
0
10 Oct 2020
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Shaoxiong Feng
Xuancheng Ren
Hongshen Chen
Bin Sun
Kan Li
Xu Sun
18
20
0
05 Oct 2020
Sharpness-Aware Minimization for Efficiently Improving Generalization
Sharpness-Aware Minimization for Efficiently Improving Generalization
Pierre Foret
Ariel Kleiner
H. Mobahi
Behnam Neyshabur
AAML
92
1,278
0
03 Oct 2020
Effective Regularization Through Loss-Function Metalearning
Effective Regularization Through Loss-Function Metalearning
Santiago Gonzalez
Risto Miikkulainen
24
5
0
02 Oct 2020
Learning Rates as a Function of Batch Size: A Random Matrix Theory
  Approach to Neural Network Training
Learning Rates as a Function of Batch Size: A Random Matrix Theory Approach to Neural Network Training
Diego Granziol
S. Zohren
Stephen J. Roberts
ODL
37
48
0
16 Jun 2020
Exploring the Vulnerability of Deep Neural Networks: A Study of
  Parameter Corruption
Exploring the Vulnerability of Deep Neural Networks: A Study of Parameter Corruption
Xu Sun
Zhiyuan Zhang
Xuancheng Ren
Ruixuan Luo
Liangyou Li
22
39
0
10 Jun 2020
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
ADAHESSIAN: An Adaptive Second Order Optimizer for Machine Learning
Z. Yao
A. Gholami
Sheng Shen
Mustafa Mustafa
Kurt Keutzer
Michael W. Mahoney
ODL
11
273
0
01 Jun 2020
Pruning artificial neural networks: a way to find well-generalizing,
  high-entropy sharp minima
Pruning artificial neural networks: a way to find well-generalizing, high-entropy sharp minima
Enzo Tartaglione
Andrea Bragagnolo
Marco Grangetto
23
11
0
30 Apr 2020
Self-Augmentation: Generalizing Deep Networks to Unseen Classes for
  Few-Shot Learning
Self-Augmentation: Generalizing Deep Networks to Unseen Classes for Few-Shot Learning
Jinhwan Seo
Hong G Jung
Seong-Whan Lee
SSL
12
39
0
01 Apr 2020
On Bayesian posterior mean estimators in imaging sciences and
  Hamilton-Jacobi Partial Differential Equations
On Bayesian posterior mean estimators in imaging sciences and Hamilton-Jacobi Partial Differential Equations
Jérome Darbon
G. P. Langlois
19
8
0
12 Mar 2020
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
Bayesian Deep Learning and a Probabilistic Perspective of Generalization
A. Wilson
Pavel Izmailov
UQCV
BDL
OOD
24
639
0
20 Feb 2020
Robust Reinforcement Learning via Adversarial training with Langevin
  Dynamics
Robust Reinforcement Learning via Adversarial training with Langevin Dynamics
Parameswaran Kamalaruban
Yu-ting Huang
Ya-Ping Hsieh
Paul Rolland
C. Shi
V. Cevher
23
59
0
14 Feb 2020
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient
  Descent Exponentially Favors Flat Minima
A Diffusion Theory For Deep Learning Dynamics: Stochastic Gradient Descent Exponentially Favors Flat Minima
Zeke Xie
Issei Sato
Masashi Sugiyama
ODL
20
17
0
10 Feb 2020
MS-Net: Multi-Site Network for Improving Prostate Segmentation with
  Heterogeneous MRI Data
MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data
Quande Liu
Qi Dou
Lequan Yu
Pheng Ann Heng
OOD
71
274
0
09 Feb 2020
Previous
1234
Next