Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards
Simpler Subnetworks

Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks

7 June 2023

Atsushi Yamamura

Papers citing "Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks"

13 / 13 papers shown

Title
Do we really have to filter out random noise in pre-training data for language models? Jinghan Ru Yuxin Xie Xianwei Zhuang Yuguo Yin Zhihui Guo Zhiming Liu Qianli Ren Yuexian Zou 83 4 0 10 Feb 2025
Deep Weight Factorization: Sparse Learning Through the Lens of Artificial Symmetries Chris Kolb T. Weber Bernd Bischl David Rügamer 115 0 0 04 Feb 2025
Remove Symmetries to Control Model Expressivity and Improve Optimization Liu Ziyin Yizhou Xu Isaac Chuang AAML 43 1 0 28 Aug 2024
How Neural Networks Learn the Support is an Implicit Regularization Effect of SGD Pierfrancesco Beneventano Andrea Pinto Tomaso A. Poggio MLT 32 1 0 17 Jun 2024
Large Learning Rates Improve Generalization: But How Large Are We Talking About? E. Lobacheva Eduard Pockonechnyy M. Kodryan Dmitry Vetrov AI4CE 16 0 0 19 Nov 2023
Layer-wise Linear Mode Connectivity Linara Adilova Maksym Andriushchenko Michael Kamp Asja Fischer Martin Jaggi FedML FAtt MoMe 33 15 0 13 Jul 2023
Type-II Saddles and Probabilistic Stability of Stochastic Gradient Descent Liu Ziyin Botao Li Tomer Galanti Masakuni Ueda 37 7 0 23 Mar 2023
What shapes the loss landscape of self-supervised learning? Liu Ziyin Ekdeep Singh Lubana Masakuni Ueda Hidenori Tanaka 52 20 0 02 Oct 2022
Implicit Bias of Large Depth Networks: a Notion of Rank for Nonlinear Functions Arthur Jacot 44 25 0 29 Sep 2022
What Happens after SGD Reaches Zero Loss? --A Mathematical Framework Zhiyuan Li Tianhao Wang Sanjeev Arora MLT 90 99 0 13 Oct 2021
Stochastic Training is Not Necessary for Generalization Jonas Geiping Micah Goldblum Phillip E. Pope Michael Moeller Tom Goldstein 89 72 0 29 Sep 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics D. Kunin Javier Sagastuy-Breña Surya Ganguli Daniel L. K. Yamins Hidenori Tanaka 107 77 0 08 Dec 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 308 2,892 0 15 Sep 2016