Exact solutions to the nonlinear dynamics of learning in deep linear neural networks

20 December 2013

Papers citing "Exact solutions to the nonlinear dynamics of learning in deep linear neural networks"

41 / 41 papers shown

Title
The emergence of sparse attention: impact of data distribution and benefits of repetition Nicolas Zucchet Francesco dÁngelo Andrew Kyle Lampinen Stephanie C. Y. Chan 86 0 0 23 May 2025
Accelerating Learned Image Compression Through Modeling Neural Training Dynamics Yichi Zhang Zhihao Duan Yuning Huang Fengqing Zhu 137 0 0 23 May 2025
Sinusoidal Initialization, Time for a New Start Alberto Fernández-Hernández Jose I. Mestre Manuel F. Dolz Jose Duato Enrique S. Quintana-Ortí ODL AI4CE 112 0 0 19 May 2025
A Local Polyak-Lojasiewicz and Descent Lemma of Gradient Descent For Overparametrized Linear Models Ziqing Xu Hancheng Min Salma Tarmoun Enrique Mallada Rene Vidal 65 0 0 16 May 2025
Shrinkage Initialization for Smooth Learning of Neural Networks Miao Cheng Feiyan Zhou Hongwei Zou Limin Wang AI4CE 50 0 0 12 Apr 2025
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model Moritz A. Zanger Pascal R. van der Vaart Wendelin Bohmer M. Spaan UQCV BDL 357 1 0 14 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking) Yoonsoo Nam Seok Hyeong Lee Clementine Domine Yea Chan Park Charles London Wonyl Choi Niclas Goring Seungjai Lee AI4CE 104 0 0 28 Feb 2025
Training Large Neural Networks With Low-Dimensional Error Feedback Maher Hanut Jonathan Kadmon 76 1 0 27 Feb 2025
Stacking as Accelerated Gradient Descent Naman Agarwal Pranjal Awasthi Satyen Kale Eric Zhao ODL 94 2 0 20 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace Alessandro Laio Sebastian Goldt 92 8 0 17 Feb 2025
Universal Sharpness Dynamics in Neural Network Training: Fixed Point Analysis, Edge of Stability, and Route to Chaos Dayal Singh Kalra Tianyu He M. Barkeshli 89 4 0 17 Feb 2025
On Vanishing Gradients, Over-Smoothing, and Over-Squashing in GNNs: Bridging Recurrent and Graph Learning Alvaro Arroyo Alessio Gravina Benjamin Gutteridge Federico Barbero Claudio Gallicchio Xiaowen Dong Michael M. Bronstein P. Vandergheynst 85 8 0 15 Feb 2025
Deep Linear Network Training Dynamics from Random Initialization: Data, Width, Depth, and Hyperparameter Transfer Blake Bordelon Cengiz Pehlevan AI4CE 116 1 0 04 Feb 2025
A theoretical framework for overfitting in energy-based modeling Giovanni Catania A. Decelle Cyril Furtlehner Beatriz Seoane 103 2 0 31 Jan 2025
Gradient Descent Converges Linearly to Flatter Minima than Gradient Flow in Shallow Linear Networks Pierfrancesco Beneventano Blake Woodworth MLT 73 1 0 15 Jan 2025
Singular Value Scaling: Efficient Generative Model Compression via Pruned Weights Refinement H. Kim Jaejun Yoo 85 0 0 23 Dec 2024
Pretraining with random noise for uncertainty calibration Jeonghwan Cheon Se-Bum Paik OnRL 91 1 0 23 Dec 2024
Theoretical characterisation of the Gauss-Newton conditioning in Neural Networks Jim Zhao Sidak Pal Singh Aurelien Lucchi AI4CE 78 0 0 04 Nov 2024
Bilinear Sequence Regression: A Model for Learning from Long Sequences of High-dimensional Tokens Vittorio Erba Emanuele Troiani Luca Biggio Antoine Maillard Lenka Zdeborová 122 1 0 24 Oct 2024
Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models Yuheng Lu Bingshuo Qian Caixia Yuan Huixing Jiang Xiaojie Wang CLL 58 0 0 22 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 70 14 0 26 Sep 2024
From Lazy to Rich: Exact Learning Dynamics in Deep Linear Networks Clémentine Dominé Nicolas Anguita A. Proca Lukas Braun D. Kunin P. Mediano Andrew M. Saxe 67 3 0 22 Sep 2024
Remove Symmetries to Control Model Expressivity and Improve Optimization Liu Ziyin Yizhou Xu Isaac Chuang AAML 62 1 0 28 Aug 2024
InfoNCE: Identifying the Gap Between Theory and Practice E. Rusak Patrik Reizinger Attila Juhos Oliver Bringmann Roland S. Zimmermann Wieland Brendel 73 7 0 28 Jun 2024
Fusion-PSRO: Nash Policy Fusion for Policy Space Response Oracles Jiesong Lian Yucong Huang Chengdong Ma Mingzhi Wang Ying Wen Long Hu Yixue Hao 77 0 0 31 May 2024
Pretraining with Random Noise for Fast and Robust Learning without Weight Transport Jeonghwan Cheon Sang Wan Lee Se-Bum Paik OOD 339 2 0 27 May 2024
Cascade of phase transitions in the training of Energy-based models Dimitrios Bachtis Giulio Biroli A. Decelle Beatriz Seoane 52 4 0 23 May 2024
Early Directional Convergence in Deep Homogeneous Neural Networks for Small Initializations Akshay Kumar Jarvis Haupt ODL 57 3 0 12 Mar 2024
Learning time-scales in two-layers neural networks Raphael Berthier Andrea Montanari Kangjie Zhou 86 33 0 28 Feb 2023
KOALA: A Kalman Optimization Algorithm with Loss Adaptivity A. Davtyan Sepehr Sameni L. Cerkezi Givi Meishvili Adam Bielski Paolo Favaro ODL 93 2 0 07 Jul 2021
AMEIR: Automatic Behavior Modeling, Interaction Exploration and MLP Investigation in the Recommender System Pengyu Zhao Kecheng Xiao Yuanxing Zhang Kaigui Bian Wei Yan 63 16 0 10 Jun 2020
Two Routes to Scalable Credit Assignment without Weight Symmetry D. Kunin Aran Nayebi Javier Sagastuy-Breña Surya Ganguli Jonathan M. Bloom Daniel L. K. Yamins 86 33 0 28 Feb 2020
Attributed Sequence Embedding Zhongfang Zhuang Xiangnan Kong Elke A. Rundensteiner Jihane Zouaoui Aditya Arora 142 12 0 03 Nov 2019
Generalization in multitask deep neural classifiers: a statistical physics approach Tyler Lee A. Ndirango AI4CE 118 20 0 30 Oct 2019
All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation Di Xie Jiang Xiong Shiliang Pu 86 182 0 06 Mar 2017
An Analytical Formula of Population Gradient for two-layered ReLU network and its Applications in Convergence and Critical Point Analysis Yuandong Tian MLT 114 216 0 02 Mar 2017
Exponentially vanishing sub-optimal local minima in multilayer neural networks Daniel Soudry Elad Hoffer 108 97 0 19 Feb 2017
Big Neural Networks Waste Capacity Yann N. Dauphin Yoshua Bengio 70 84 0 16 Jan 2013
On the difficulty of training Recurrent Neural Networks Razvan Pascanu Tomas Mikolov Yoshua Bengio ODL 132 5,318 0 21 Nov 2012
Multi-column Deep Neural Networks for Image Classification D. Ciresan U. Meier Jürgen Schmidhuber 111 3,935 0 13 Feb 2012
Building high-level features using large scale unsupervised learning Quoc V. Le MarcÁurelio Ranzato R. Monga M. Devin Kai Chen G. Corrado J. Dean A. Ng SSL OffRL CVBM 93 2,268 0 29 Dec 2011