Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

16 February 2018

Papers citing "Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks"

32 / 32 papers shown

Title
External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation Mingfu Liang Xi Liu Rong Jin B. Liu Qiuling Suo ... Bo Long Wenlin Chen Rocky Liu Santanu Kolay Hao Li 46 2 0 20 Feb 2025
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport Raphael Barboni Gabriel Peyré Franccois-Xavier Vialard 37 3 0 19 Mar 2024
On a continuous time model of gradient descent dynamics and instability in deep learning Mihaela Rosca Yan Wu Chongli Qin Benoit Dherin 16 6 0 03 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models Yufeng Zhang Boyi Liu Qi Cai Lingxiao Wang Zhaoran Wang 53 11 0 30 Dec 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 19 1 0 10 Oct 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do Niladri S. Chatterji Philip M. Long 23 8 0 19 Sep 2022
Do Residual Neural Networks discretize Neural Ordinary Differential Equations? Michael E. Sander Pierre Ablin Gabriel Peyré 32 25 0 29 May 2022
On Feature Learning in Neural Networks with Global Convergence Guarantees Zhengdao Chen Eric Vanden-Eijnden Joan Bruna MLT 36 13 0 22 Apr 2022
Convergence of gradient descent for deep neural networks S. Chatterjee ODL 21 20 0 30 Mar 2022
Architecture Matters in Continual Learning Seyed Iman Mirzadeh Arslan Chaudhry Dong Yin Timothy Nguyen Razvan Pascanu Dilan Görür Mehrdad Farajtabar OOD KELM 116 58 0 01 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks Noam Razin Asaf Maman Nadav Cohen 46 29 0 27 Jan 2022
The loss landscape of deep linear neural networks: a second-order analysis E. M. Achour Franccois Malgouyres Sébastien Gerchinovitz ODL 24 9 0 28 Jul 2021
Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction Dominik Stöger Mahdi Soltanolkotabi ODL 42 75 0 28 Jun 2021
Understanding self-supervised Learning Dynamics without Contrastive Pairs Yuandong Tian Xinlei Chen Surya Ganguli SSL 138 281 0 12 Feb 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks Asaf Noy Yi Tian Xu Y. Aflalo Lihi Zelnik-Manor R. L. Jin 31 3 0 12 Jan 2021
On the linearity of large non-linear models: when and why the tangent kernel is constant Chaoyue Liu Libin Zhu M. Belkin 21 140 0 02 Oct 2020
Deep matrix factorizations Pierre De Handschutter Nicolas Gillis Xavier Siebert BDL 28 40 0 01 Oct 2020
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't E. Weinan Chao Ma Stephan Wojtowytsch Lei Wu AI4CE 22 133 0 22 Sep 2020
Implicit Regularization in Deep Learning May Not Be Explainable by Norms Noam Razin Nadav Cohen 24 155 0 13 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth Yiping Lu Chao Ma Yulong Lu Jianfeng Lu Lexing Ying MLT 39 78 0 11 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks Chaoyue Liu Libin Zhu M. Belkin ODL 4 247 0 29 Feb 2020
Optimization for deep learning: theory and algorithms Ruoyu Sun ODL 14 168 0 19 Dec 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks Lei Wu Qingcan Wang Chao Ma ODL AI4CE 25 22 0 02 Nov 2019
Implicit Regularization in Deep Matrix Factorization Sanjeev Arora Nadav Cohen Wei Hu Yuping Luo AI4CE 24 491 0 31 May 2019
Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections E. Weinan Chao Ma Qingcan Wang Lei Wu MLT 27 22 0 10 Apr 2019
Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning Kenji Kawaguchi Jiaoyang Huang L. Kaelbling AAML 16 18 0 07 Apr 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks S. Du Wei Hu 16 93 0 24 Jan 2019
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks Difan Zou Yuan Cao Dongruo Zhou Quanquan Gu ODL 22 446 0 21 Nov 2018
On the Convergence Rate of Training Recurrent Neural Networks Zeyuan Allen-Zhu Yuanzhi Li Zhao-quan Song 18 191 0 29 Oct 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks Sanjeev Arora Nadav Cohen Noah Golowich Wei Hu 18 281 0 04 Oct 2018
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks Ohad Shamir 32 45 0 23 Sep 2018
Representing smooth functions as compositions of near-identity functions with implications for deep network optimization Peter L. Bartlett S. Evans Philip M. Long 73 31 0 13 Apr 2018