A spring-block theory of feature learning in deep neural networks

28 July 2024

Papers citing "A spring-block theory of feature learning in deep neural networks"

49 / 49 papers shown

Title
$The boundary of neural network trainability is fractal$ The boundary of neural network trainability is fractal Jascha Narain Sohl-Dickstein 52 8 0 09 Feb 2024
Asymptotics of feature learning in two-layer networks after one gradient-step Hugo Cui Luca Pesce Yatin Dandi Florent Krzakala Yue M. Lu Lenka Zdeborová Bruno Loureiro MLT 88 19 0 07 Feb 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents Yatin Dandi Emanuele Troiani Luca Arnaboldi Luca Pesce Lenka Zdeborová Florent Krzakala MLT 78 29 0 05 Feb 2024
A Dynamical Model of Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 84 41 0 02 Feb 2024
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training Yefan Zhou Tianyu Pang Keqin Liu Charles H. Martin Michael W. Mahoney Yaoqing Yang 95 11 0 01 Dec 2023
On the different regimes of Stochastic Gradient Descent Antonio Sclocchi Matthieu Wyart 49 20 0 19 Sep 2023
A Neural Collapse Perspective on Feature Evolution in Graph Neural Networks Vignesh Kothapalli Tom Tirer Joan Bruna 61 13 0 04 Jul 2023
Stochastic Collapse: How Gradient Noise Attracts SGD Dynamics Towards Simpler Subnetworks F. Chen D. Kunin Atsushi Yamamura Surya Ganguli 80 27 0 07 Jun 2023
The Law of Parsimony in Gradient Descent for Learning Deep Linear Networks Can Yaras Peng Wang Wei Hu Zhihui Zhu Laura Balzano Qing Qu 63 18 0 01 Jun 2023
A Rainbow in Deep Network Black Boxes Florentin Guth Brice Ménard G. Rochette S. Mallat 74 11 0 29 May 2023
Stochastic Modified Equations and Dynamics of Dropout Algorithm Zhongwang Zhang Yuqing Li Yaoyu Zhang Z. Xu 41 9 0 25 May 2023
Phase transitions in the mini-batch size for sparse and dense two-layer neural networks Raffaele Marino F. Ricci-Tersenghi 52 15 0 10 May 2023
Inversion dynamics of class manifolds in deep learning reveals tradeoffs underlying generalisation Simone Ciceri Lorenzo Cassani Matteo Osella P. Rotondo P. Pizzochero M. Gherardi 54 7 0 09 Mar 2023
Injectivity of ReLU networks: perspectives from statistical physics Antoine Maillard Afonso S. Bandeira David Belius Ivan Dokmanić S. Nakajima 44 5 0 27 Feb 2023
From high-dimensional & mean-field dynamics to dimensionless ODEs: A unifying approach to SGD in two-layers networks Luca Arnaboldi Ludovic Stephan Florent Krzakala Bruno Loureiro MLT 62 33 0 12 Feb 2023
Homophily modulates double descent generalization in graph convolution networks Chengzhi Shi Liming Pan Hong Hu Ivan Dokmanić 53 9 0 26 Dec 2022
A Law of Data Separation in Deep Learning Hangfeng He Weijie J. Su OOD 71 41 0 31 Oct 2022
High-dimensional limit theorems for SGD: Effective dynamics and critical scaling Gerard Ben Arous Reza Gheissari Aukosh Jagannath 89 58 0 08 Jun 2022
Stochastic gradient descent introduces an effective landscape-dependent regularization favoring flat solutions Ning Yang Chao Tang Yuhai Tu MLT 27 21 0 02 Jun 2022
Self-Consistent Dynamical Field Theory of Kernel Evolution in Wide Neural Networks Blake Bordelon Cengiz Pehlevan MLT 56 82 0 19 May 2022
Feature Learning and Signal Propagation in Deep Neural Networks Yizhang Lou Chris Mingard Yoonsoo Nam Soufiane Hayou MDE 58 18 0 22 Oct 2021
Unveiling the structure of wide flat minima in neural networks Carlo Baldassi Clarissa Lauditi Enrico M. Malatesta Gabriele Perugini R. Zecchina 51 34 0 02 Jul 2021
Label Noise SGD Provably Prefers Flat Global Minimizers Alexandru Damian Tengyu Ma Jason D. Lee NoLa 97 119 0 11 Jun 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training Cong Fang Hangfeng He Qi Long Weijie J. Su FAtt 147 170 0 29 Jan 2021
Statistical Mechanics of Deep Linear Neural Networks: The Back-Propagating Kernel Renormalization Qianyi Li H. Sompolinsky 131 72 0 07 Dec 2020
Prevalence of Neural Collapse during the terminal phase of deep learning training Vardan Papyan Xuemei Han D. Donoho 184 574 0 18 Aug 2020
Phase diagram for two-layer ReLU neural networks at infinite-width limit Yaoyu Zhang Zhi-Qin John Xu Zheng Ma Yaoyu Zhang 50 61 0 15 Jul 2020
What Do Neural Networks Learn When Trained With Random Labels? Hartmut Maennel Ibrahim Alabdulmohsin Ilya O. Tolstikhin R. Baldock Olivier Bousquet Sylvain Gelly Daniel Keysers FedML 140 89 0 18 Jun 2020
The Hidden Convex Optimization Landscape of Two-Layer ReLU Neural Networks: an Exact Characterization of the Optimal Solutions Yifei Wang Jonathan Lacotte Mert Pilanci MLT 50 27 0 10 Jun 2020
On Exact Computation with an Infinitely Wide Neural Net Sanjeev Arora S. Du Wei Hu Zhiyuan Li Ruslan Salakhutdinov Ruosong Wang 209 922 0 26 Apr 2019
On Lazy Training in Differentiable Programming Lénaïc Chizat Edouard Oyallon Francis R. Bach 102 833 0 19 Dec 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks Sanjeev Arora Nadav Cohen Noah Golowich Wei Hu 110 290 0 04 Oct 2018
Geometry of energy landscapes and the optimizability of deep neural networks Simon Becker Yao Zhang A. Lee 32 30 0 01 Aug 2018
Neural Tangent Kernel: Convergence and Generalization in Neural Networks Arthur Jacot Franck Gabriel Clément Hongler 252 3,194 0 20 Jun 2018
On the Global Convergence of Gradient Descent for Over-parameterized Models using Optimal Transport Lénaïc Chizat Francis R. Bach OT 200 735 0 24 May 2018
A Mean Field View of the Landscape of Two-Layers Neural Networks Song Mei Andrea Montanari Phan-Minh Nguyen MLT 81 858 0 18 Apr 2018
On the Optimization of Deep Networks: Implicit Acceleration by Overparameterization Sanjeev Arora Nadav Cohen Elad Hazan 97 483 0 19 Feb 2018
Three Factors Influencing Minima in SGD Stanislaw Jastrzebski Zachary Kenton Devansh Arpit Nicolas Ballas Asja Fischer Yoshua Bengio Amos Storkey 76 463 0 13 Nov 2017
Rethinking generalization requires revisiting old ideas: statistical mechanics approaches and complex learning behavior Charles H. Martin Michael W. Mahoney AI4CE 47 64 0 26 Oct 2017
A Bayesian Perspective on Generalization and Stochastic Gradient Descent Samuel L. Smith Quoc V. Le BDL 61 251 0 17 Oct 2017
Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour Priya Goyal Piotr Dollár Ross B. Girshick P. Noordhuis Lukasz Wesolowski Aapo Kyrola Andrew Tulloch Yangqing Jia Kaiming He 3DH 120 3,675 0 08 Jun 2017
Exponential expressivity in deep neural networks through transient chaos Ben Poole Subhaneil Lahiri M. Raghu Jascha Narain Sohl-Dickstein Surya Ganguli 88 591 0 16 Jun 2016
Toward Deeper Understanding of Neural Networks: The Power of Initialization and a Dual View on Expressivity Amit Daniely Roy Frostig Y. Singer 156 343 0 18 Feb 2016
Stochastic modified equations and adaptive stochastic gradient algorithms Qianxiao Li Cheng Tai E. Weinan 59 284 0 19 Nov 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift Sergey Ioffe Christian Szegedy OOD 439 43,277 0 11 Feb 2015
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification Kaiming He Xinming Zhang Shaoqing Ren Jian Sun VLM 298 18,587 0 06 Feb 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.6K 150,006 0 22 Dec 2014
Training Convolutional Networks with Noisy Labels Sainbayar Sukhbaatar Joan Bruna Manohar Paluri Lubomir D. Bourdev Rob Fergus NoLa 89 272 0 09 Jun 2014
Visualizing and Understanding Convolutional Networks Matthew D. Zeiler Rob Fergus FAtt SSL 563 15,874 0 12 Nov 2013