ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1611.01838
  4. Cited By
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

6 November 2016
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
    ODL
ArXivPDFHTML

Papers citing "Entropy-SGD: Biasing Gradient Descent Into Wide Valleys"

50 / 164 papers shown
Title
MS-Net: Multi-Site Network for Improving Prostate Segmentation with
  Heterogeneous MRI Data
MS-Net: Multi-Site Network for Improving Prostate Segmentation with Heterogeneous MRI Data
Quande Liu
Qi Dou
Lequan Yu
Pheng Ann Heng
OOD
71
274
0
09 Feb 2020
'Place-cell' emergence and learning of invariant data with restricted
  Boltzmann machines: breaking and dynamical restoration of continuous
  symmetries in the weight space
'Place-cell' emergence and learning of invariant data with restricted Boltzmann machines: breaking and dynamical restoration of continuous symmetries in the weight space
Moshir Harsh
J. Tubiana
Simona Cocco
R. Monasson
9
14
0
30 Dec 2019
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
19
168
0
19 Dec 2019
Improving Model Robustness Using Causal Knowledge
Improving Model Robustness Using Causal Knowledge
T. Kyono
M. Schaar
OOD
22
12
0
27 Nov 2019
Information-Theoretic Local Minima Characterization and Regularization
Information-Theoretic Local Minima Characterization and Regularization
Zhiwei Jia
Hao Su
27
19
0
19 Nov 2019
Improved Sample Complexities for Deep Networks and Robust Classification
  via an All-Layer Margin
Improved Sample Complexities for Deep Networks and Robust Classification via an All-Layer Margin
Colin Wei
Tengyu Ma
AAML
OOD
36
85
0
09 Oct 2019
GradVis: Visualization and Second Order Analysis of Optimization
  Surfaces during the Training of Deep Neural Networks
GradVis: Visualization and Second Order Analysis of Optimization Surfaces during the Training of Deep Neural Networks
Avraam Chatzimichailidis
Franz-Josef Pfreundt
N. Gauger
J. Keuper
19
10
0
26 Sep 2019
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic
  Training
EEG-Based Driver Drowsiness Estimation Using Feature Weighted Episodic Training
Yuqi Cui
Yifan Xu
Dongrui Wu
13
62
0
25 Sep 2019
Understanding and Robustifying Differentiable Architecture Search
Understanding and Robustifying Differentiable Architecture Search
Arber Zela
T. Elsken
Tonmoy Saikia
Yassine Marrakchi
Thomas Brox
Frank Hutter
OOD
AAML
66
366
0
20 Sep 2019
Learned imaging with constraints and uncertainty quantification
Learned imaging with constraints and uncertainty quantification
Felix J. Herrmann
Ali Siahkoohi
G. Rizzuti
UQCV
22
23
0
13 Sep 2019
Knowledge Transfer Graph for Deep Collaborative Learning
Knowledge Transfer Graph for Deep Collaborative Learning
Soma Minami
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
28
9
0
10 Sep 2019
Visualizing and Understanding the Effectiveness of BERT
Visualizing and Understanding the Effectiveness of BERT
Y. Hao
Li Dong
Furu Wei
Ke Xu
22
181
0
15 Aug 2019
On the Existence of Simpler Machine Learning Models
On the Existence of Simpler Machine Learning Models
Lesia Semenova
Cynthia Rudin
Ronald E. Parr
26
85
0
05 Aug 2019
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Hessian based analysis of SGD for Deep Nets: Dynamics and Generalization
Xinyan Li
Qilong Gu
Yingxue Zhou
Tiancong Chen
A. Banerjee
ODL
36
51
0
24 Jul 2019
Post-synaptic potential regularization has potential
Post-synaptic potential regularization has potential
Enzo Tartaglione
Daniele Perlo
Marco Grangetto
BDL
AAML
27
6
0
19 Jul 2019
Chaining Meets Chain Rule: Multilevel Entropic Regularization and
  Training of Neural Nets
Chaining Meets Chain Rule: Multilevel Entropic Regularization and Training of Neural Nets
Amir-Reza Asadi
Emmanuel Abbe
BDL
AI4CE
34
13
0
26 Jun 2019
Learning to Forget for Meta-Learning
Learning to Forget for Meta-Learning
Sungyong Baik
Seokil Hong
Kyoung Mu Lee
CLL
KELM
19
87
0
13 Jun 2019
Limitations of the Empirical Fisher Approximation for Natural Gradient
  Descent
Limitations of the Empirical Fisher Approximation for Natural Gradient Descent
Frederik Kunstner
Lukas Balles
Philipp Hennig
21
207
0
29 May 2019
Gradient Descent with Early Stopping is Provably Robust to Label Noise
  for Overparameterized Neural Networks
Gradient Descent with Early Stopping is Provably Robust to Label Noise for Overparameterized Neural Networks
Mingchen Li
Mahdi Soltanolkotabi
Samet Oymak
NoLa
47
351
0
27 Mar 2019
Multilingual Neural Machine Translation with Knowledge Distillation
Multilingual Neural Machine Translation with Knowledge Distillation
Xu Tan
Yi Ren
Di He
Tao Qin
Zhou Zhao
Tie-Yan Liu
20
248
0
27 Feb 2019
An Empirical Study of Large-Batch Stochastic Gradient Descent with
  Structured Covariance Noise
An Empirical Study of Large-Batch Stochastic Gradient Descent with Structured Covariance Noise
Yeming Wen
Kevin Luk
Maxime Gazeau
Guodong Zhang
Harris Chan
Jimmy Ba
ODL
20
22
0
21 Feb 2019
Investigating Generalisation in Continuous Deep Reinforcement Learning
Investigating Generalisation in Continuous Deep Reinforcement Learning
Chenyang Zhao
Olivier Sigaud
F. Stulp
Timothy M. Hospedales
OffRL
14
48
0
19 Feb 2019
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural
  Networks
A Tail-Index Analysis of Stochastic Gradient Noise in Deep Neural Networks
Umut Simsekli
Levent Sagun
Mert Gurbuzbalaban
20
237
0
18 Jan 2019
An Empirical Study of Example Forgetting during Deep Neural Network
  Learning
An Empirical Study of Example Forgetting during Deep Neural Network Learning
Mariya Toneva
Alessandro Sordoni
Rémi Tachet des Combes
Adam Trischler
Yoshua Bengio
Geoffrey J. Gordon
46
712
0
12 Dec 2018
Gradient Descent Happens in a Tiny Subspace
Gradient Descent Happens in a Tiny Subspace
Guy Gur-Ari
Daniel A. Roberts
Ethan Dyer
28
228
0
12 Dec 2018
Stagewise Training Accelerates Convergence of Testing Error Over SGD
Stagewise Training Accelerates Convergence of Testing Error Over SGD
Zhuoning Yuan
Yan Yan
R. L. Jin
Tianbao Yang
52
11
0
10 Dec 2018
Wireless Network Intelligence at the Edge
Wireless Network Intelligence at the Edge
Jihong Park
S. Samarakoon
M. Bennis
Mérouane Debbah
21
518
0
07 Dec 2018
Simulated Tempering Langevin Monte Carlo II: An Improved Proof using
  Soft Markov Chain Decomposition
Simulated Tempering Langevin Monte Carlo II: An Improved Proof using Soft Markov Chain Decomposition
Rong Ge
Holden Lee
Andrej Risteski
14
27
0
29 Nov 2018
Single-Label Multi-Class Image Classification by Deep Logistic
  Regression
Single-Label Multi-Class Image Classification by Deep Logistic Regression
Qi Dong
Xiatian Zhu
S. Gong
8
33
0
20 Nov 2018
Sequenced-Replacement Sampling for Deep Learning
Sequenced-Replacement Sampling for Deep Learning
C. Ho
Dae Hoon Park
Wei Yang
Yi Chang
24
0
0
19 Oct 2018
Regularization Matters: Generalization and Optimization of Neural Nets
  v.s. their Induced Kernel
Regularization Matters: Generalization and Optimization of Neural Nets v.s. their Induced Kernel
Colin Wei
J. Lee
Qiang Liu
Tengyu Ma
20
243
0
12 Oct 2018
Implicit Self-Regularization in Deep Neural Networks: Evidence from
  Random Matrix Theory and Implications for Learning
Implicit Self-Regularization in Deep Neural Networks: Evidence from Random Matrix Theory and Implications for Learning
Charles H. Martin
Michael W. Mahoney
AI4CE
35
190
0
02 Oct 2018
Interpreting Adversarial Robustness: A View from Decision Surface in
  Input Space
Interpreting Adversarial Robustness: A View from Decision Surface in Input Space
Fuxun Yu
Chenchen Liu
Yanzhi Wang
Liang Zhao
Xiang Chen
AAML
OOD
31
27
0
29 Sep 2018
Don't Use Large Mini-Batches, Use Local SGD
Don't Use Large Mini-Batches, Use Local SGD
Tao R. Lin
Sebastian U. Stich
Kumar Kshitij Patel
Martin Jaggi
57
429
0
22 Aug 2018
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine
  Learning Tasks
Ensemble Kalman Inversion: A Derivative-Free Technique For Machine Learning Tasks
Nikola B. Kovachki
Andrew M. Stuart
BDL
42
136
0
10 Aug 2018
Optimization of neural networks via finite-value quantum fluctuations
Optimization of neural networks via finite-value quantum fluctuations
Masayuki Ohzeki
Shuntaro Okada
Masayoshi Terabe
S. Taguchi
19
21
0
01 Jul 2018
Understanding Dropout as an Optimization Trick
Understanding Dropout as an Optimization Trick
Sangchul Hahn
Heeyoul Choi
ODL
13
34
0
26 Jun 2018
Persistent Hidden States and Nonlinear Transformation for Long
  Short-Term Memory
Persistent Hidden States and Nonlinear Transformation for Long Short-Term Memory
Heeyoul Choi
19
12
0
22 Jun 2018
Laplacian Smoothing Gradient Descent
Laplacian Smoothing Gradient Descent
Stanley Osher
Bao Wang
Penghang Yin
Xiyang Luo
Farzin Barekat
Minh Pham
A. Lin
ODL
22
43
0
17 Jun 2018
The committee machine: Computational to statistical gaps in learning a
  two-layers neural network
The committee machine: Computational to statistical gaps in learning a two-layers neural network
Benjamin Aubin
Antoine Maillard
Jean Barbier
Florent Krzakala
N. Macris
Lenka Zdeborová
41
104
0
14 Jun 2018
Understanding Batch Normalization
Understanding Batch Normalization
Johan Bjorck
Carla P. Gomes
B. Selman
Kilian Q. Weinberger
18
593
0
01 Jun 2018
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep
  Learning
SmoothOut: Smoothing Out Sharp Minima to Improve Generalization in Deep Learning
W. Wen
Yandan Wang
Feng Yan
Cong Xu
Chunpeng Wu
Yiran Chen
H. Li
24
50
0
21 May 2018
On Visual Hallmarks of Robustness to Adversarial Malware
On Visual Hallmarks of Robustness to Adversarial Malware
Alex Huang
Abdullah Al-Dujaili
Erik Hemberg
Una-May O’Reilly
AAML
25
7
0
09 May 2018
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian
  Compression Approach
Non-Vacuous Generalization Bounds at the ImageNet Scale: A PAC-Bayesian Compression Approach
Wenda Zhou
Victor Veitch
Morgane Austern
Ryan P. Adams
Peter Orbanz
38
209
0
16 Apr 2018
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Comparing Dynamics: Deep Neural Networks versus Glassy Systems
Marco Baity-Jesi
Levent Sagun
Mario Geiger
S. Spigler
Gerard Ben Arous
C. Cammarota
Yann LeCun
M. Wyart
Giulio Biroli
AI4CE
33
113
0
19 Mar 2018
On the insufficiency of existing momentum schemes for Stochastic
  Optimization
On the insufficiency of existing momentum schemes for Stochastic Optimization
Rahul Kidambi
Praneeth Netrapalli
Prateek Jain
Sham Kakade
ODL
22
117
0
15 Mar 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
39
1,617
0
14 Mar 2018
Understanding and Enhancing the Transferability of Adversarial Examples
Understanding and Enhancing the Transferability of Adversarial Examples
Lei Wu
Zhanxing Zhu
Cheng Tai
E. Weinan
AAML
SILM
28
96
0
27 Feb 2018
Stronger generalization bounds for deep nets via a compression approach
Stronger generalization bounds for deep nets via a compression approach
Sanjeev Arora
Rong Ge
Behnam Neyshabur
Yi Zhang
MLT
AI4CE
23
630
0
14 Feb 2018
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
98
1,844
0
28 Dec 2017
Previous
1234
Next