A Bayesian Perspective on Generalization and Stochastic Gradient Descent

17 October 2017

Papers citing "A Bayesian Perspective on Generalization and Stochastic Gradient Descent"

13 / 13 papers shown

Title
Generalization through variance: how noise shapes inductive biases in diffusion models John J. Vastola DiffM 384 3 0 16 Apr 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 73 1 0 15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit Oleg Filatov Jan Ebert Jiangtao Wang Stefan Kesselheim 67 4 0 10 Jan 2025
How Does Critical Batch Size Scale in Pre-training? Hanlin Zhang Depen Morwani Nikhil Vyas Jingfeng Wu Difan Zou Udaya Ghai Dean Phillips Foster Sham Kakade 99 15 0 29 Oct 2024
Continual learning with the neural tangent ensemble Ari S. Benjamin Christian Pehle Kyle Daruwalla UQCV 96 0 0 30 Aug 2024
Variational Stochastic Gradient Descent for Deep Neural Networks Haotian Chen Anna Kuzina Babak Esmaeili Jakub M. Tomczak 59 0 0 09 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks Haiyun He Christina Lee Yu 75 5 0 04 Apr 2024
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach Ryo Karakida S. Akaho S. Amari FedML 106 142 0 04 Jun 2018
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 67 458 0 13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size Samuel L. Smith Pieter-Jan Kindermans Chris Ying Quoc V. Le ODL 90 990 0 01 Nov 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys Pratik Chaudhari A. Choromańska Stefano Soatto Yann LeCun Carlo Baldassi C. Borgs J. Chayes Levent Sagun R. Zecchina ODL 82 769 0 06 Nov 2016
PAC-Bayesian Theory Meets Bayesian Inference Pascal Germain Francis R. Bach Alexandre Lacoste Simon Lacoste-Julien 51 182 0 27 May 2016
Hybrid Deterministic-Stochastic Methods for Data Fitting M. Friedlander Mark Schmidt 116 387 0 13 Apr 2011