Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2306.04251
Cited By
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks
7 June 2023
F. Chen
D. Kunin
Atsushi Yamamura
Surya Ganguli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks"
13 / 13 papers shown
Title
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
4
0
10 Feb 2025
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries
Chris Kolb
T. Weber
Bernd Bischl
David Rügamer
115
0
0
04 Feb 2025
Remove Symmetries to Control Model Expressivity and Improve Optimization
Liu Ziyin
Yizhou Xu
Isaac Chuang
AAML
43
1
0
28 Aug 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD
Pierfrancesco Beneventano
Andrea Pinto
Tomaso A. Poggio
MLT
32
1
0
17 Jun 2024
Large Learning Rates Improve Generalization: But How Large Are We Talking About?
E. Lobacheva
Eduard Pockonechnyy
M. Kodryan
Dmitry Vetrov
AI4CE
16
0
0
19 Nov 2023
Layer-wise Linear Mode Connectivity
Linara Adilova
Maksym Andriushchenko
Michael Kamp
Asja Fischer
Martin Jaggi
FedML
FAtt
MoMe
33
15
0
13 Jul 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent
Liu Ziyin
Botao Li
Tomer Galanti
Masakuni Ueda
37
7
0
23 Mar 2023
What shapes the loss landscape of self-supervised learning?
Liu Ziyin
Ekdeep Singh Lubana
Masakuni Ueda
Hidenori Tanaka
52
20
0
02 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions
Arthur Jacot
44
25
0
29 Sep 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework
Zhiyuan Li
Tianhao Wang
Sanjeev Arora
MLT
90
99
0
13 Oct 2021
Stochastic Training is Not Necessary for Generalization
Jonas Geiping
Micah Goldblum
Phillip E. Pope
Michael Moeller
Tom Goldstein
89
72
0
29 Sep 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
308
2,892
0
15 Sep 2016
1