A Bayesian Perspective on Generalization and Stochastic Gradient Descent

17 October 2017

Papers citing "A Bayesian Perspective on Generalization and Stochastic Gradient Descent"

15 / 15 papers shown

Title
Generalization through variance: how noise shapes inductive biases in diffusion models John J. Vastola DiffM 393 3 0 16 Apr 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 73 1 0 15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit Oleg Filatov Jan Ebert Jiangtao Wang Stefan Kesselheim 72 4 0 10 Jan 2025
How Does Critical Batch Size Scale in Pre-training? Hanlin Zhang Depen Morwani Nikhil Vyas Jingfeng Wu Difan Zou Udaya Ghai Dean Phillips Foster Sham Kakade 106 15 0 29 Oct 2024
Continual learning with the neural tangent ensemble Ari S. Benjamin Christian Pehle Kyle Daruwalla UQCV 96 0 0 30 Aug 2024
Variational Stochastic Gradient Descent for Deep Neural Networks Haotian Chen Anna Kuzina Babak Esmaeili Jakub M. Tomczak 62 0 0 09 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks Haiyun He Christina Lee Yu 75 5 0 04 Apr 2024
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach Ryo Karakida S. Akaho S. Amari FedML 106 143 0 04 Jun 2018
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 67 459 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 93 990 0 01 Nov 2017
Generalization in Deep Learning Kenji Kawaguchi L. Kaelbling Yoshua Bengio ODL 72 459 0 16 Oct 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 84 769 0 06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 355 2,922 0 15 Sep 2016
PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis R. Bach Alexandre Lacoste Simon Lacoste-Julien 54 182 0 27 May 2016
Hybrid Deterministic-Stochastic Methods for Data Fitting M. Friedlander Mark Schmidt 129 387 0 13 Apr 2011