Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1710.06451
Cited By
A Bayesian Perspective on Generalization and Stochastic Gradient Descent
17 October 2017
Samuel L. Smith
Quoc V. Le
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Bayesian Perspective on Generalization and Stochastic Gradient Descent"
15 / 15 papers shown
Title
Generalization through variance: how noise shapes inductive biases in diffusion models
John J. Vastola
DiffM
393
3
0
16 Apr 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks
Pierfrancesco Beneventano
Blake Woodworth
MLT
73
1
0
15 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
72
4
0
10 Jan 2025
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
106
15
0
29 Oct 2024
Continual learning with the neural tangent ensemble
Ari S. Benjamin
Christian Pehle
Kyle Daruwalla
UQCV
96
0
0
30 Aug 2024
Variational Stochastic Gradient Descent for Deep Neural Networks
Haotian Chen
Anna Kuzina
Babak Esmaeili
Jakub M. Tomczak
62
0
0
09 Apr 2024
Information-Theoretic Generalization Bounds for Deep Neural Networks
Haiyun He
Christina Lee Yu
75
5
0
04 Apr 2024
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
106
143
0
04 Jun 2018
Three Factors Influencing Minima in SGD
Stanislaw Jastrzebski
Zachary Kenton
Devansh Arpit
Nicolas Ballas
Asja Fischer
Yoshua Bengio
Amos Storkey
67
459
0
13 Nov 2017
Don't Decay the Learning Rate, Increase the Batch Size
Samuel L. Smith
Pieter-Jan Kindermans
Chris Ying
Quoc V. Le
ODL
93
990
0
01 Nov 2017
Generalization in Deep Learning
Kenji Kawaguchi
L. Kaelbling
Yoshua Bengio
ODL
72
459
0
16 Oct 2017
Entropy-SGD: Biasing Gradient Descent Into Wide Valleys
Pratik Chaudhari
A. Choromańska
Stefano Soatto
Yann LeCun
Carlo Baldassi
C. Borgs
J. Chayes
Levent Sagun
R. Zecchina
ODL
84
769
0
06 Nov 2016
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
355
2,922
0
15 Sep 2016
PAC-Bayesian Theory Meets Bayesian Inference
Pascal Germain
Francis R. Bach
Alexandre Lacoste
Simon Lacoste-Julien
54
182
0
27 May 2016
Hybrid Deterministic-Stochastic Methods for Data Fitting
M. Friedlander
Mark Schmidt
129
387
0
13 Apr 2011
1