Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

Entropy-SGD: Biasing Gradient Descent Into Wide Valleys

6 November 2016

Pratik Chaudhari

A. Choromańska

Papers citing "Entropy-SGD: Biasing Gradient Descent Into Wide Valleys"

14 / 164 papers shown

Title
Entropy-SGD optimizes the prior of a PAC-Bayes bound: Generalization properties of Entropy-SGD and data-dependent priors Gintare Karolina Dziugaite Daniel M. Roy MLT 30 144 0 26 Dec 2017
On Connecting Stochastic Gradient MCMC and Differential Privacy Bai Li Changyou Chen Hao Liu Lawrence Carin 41 38 0 25 Dec 2017
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks Ziming Zhang M. Brand 26 70 0 20 Nov 2017
Meta-Learning by Adjusting Priors Based on Extended PAC-Bayes Theory Ron Amit Ron Meir BDL MLT 32 173 0 03 Nov 2017
Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for $k$ -means Clustering Penghang Yin Minh Pham Adam M. Oberman Stanley Osher FedML 40 15 0 21 Oct 2017
Exploring Generalization in Deep Learning Behnam Neyshabur Srinadh Bhojanapalli David A. McAllester Nathan Srebro FAtt 68 1,235 0 27 Jun 2017
Proximal Backpropagation Thomas Frerix Thomas Möllenhoff Michael Möller Daniel Cremers 23 31 0 14 Jun 2017
Deep Relaxation: partial differential equations for optimizing deep neural networks Pratik Chaudhari Adam M. Oberman Stanley Osher Stefano Soatto G. Carlier 27 153 0 17 Apr 2017
Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data Gintare Karolina Dziugaite Daniel M. Roy 50 799 0 31 Mar 2017
Sharp Minima Can Generalize For Deep Nets Laurent Dinh Razvan Pascanu Samy Bengio Yoshua Bengio ODL 46 755 0 15 Mar 2017
Data-Dependent Stability of Stochastic Gradient Descent Ilja Kuzborskij Christoph H. Lampert MLT 9 165 0 05 Mar 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,890 0 15 Sep 2016
The Loss Surfaces of Multilayer Networks A. Choromańska Mikael Henaff Michaël Mathieu Gerard Ben Arous Yann LeCun ODL 183 1,185 0 30 Nov 2014
MCMC using Hamiltonian dynamics Radford M. Neal 185 3,266 0 09 Jun 2012