ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.06093
  4. Cited By
Gradient descent with identity initialization efficiently learns
  positive definite linear transformations by deep residual networks

Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks

16 February 2018
Peter L. Bartlett
D. Helmbold
Philip M. Long
ArXivPDFHTML

Papers citing "Gradient descent with identity initialization efficiently learns positive definite linear transformations by deep residual networks"

32 / 32 papers shown
Title
External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
External Large Foundation Model: How to Efficiently Serve Trillions of Parameters for Online Ads Recommendation
Mingfu Liang
Xi Liu
Rong Jin
B. Liu
Qiuling Suo
...
Bo Long
Wenlin Chen
Rocky Liu
Santanu Kolay
Hao Li
46
2
0
20 Feb 2025
Understanding the training of infinitely deep and wide ResNets with
  Conditional Optimal Transport
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Raphael Barboni
Gabriel Peyré
Franccois-Xavier Vialard
37
3
0
19 Mar 2024
On a continuous time model of gradient descent dynamics and instability
  in deep learning
On a continuous time model of gradient descent dynamics and instability in deep learning
Mihaela Rosca
Yan Wu
Chongli Qin
Benoit Dherin
16
6
0
03 Feb 2023
An Analysis of Attention via the Lens of Exchangeability and Latent
  Variable Models
An Analysis of Attention via the Lens of Exchangeability and Latent Variable Models
Yufeng Zhang
Boyi Liu
Qi Cai
Lingxiao Wang
Zhaoran Wang
53
11
0
30 Dec 2022
On skip connections and normalisation layers in deep optimisation
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
19
1
0
10 Oct 2022
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Deep Linear Networks can Benignly Overfit when Shallow Ones Do
Niladri S. Chatterji
Philip M. Long
23
8
0
19 Sep 2022
Do Residual Neural Networks discretize Neural Ordinary Differential
  Equations?
Do Residual Neural Networks discretize Neural Ordinary Differential Equations?
Michael E. Sander
Pierre Ablin
Gabriel Peyré
32
25
0
29 May 2022
On Feature Learning in Neural Networks with Global Convergence
  Guarantees
On Feature Learning in Neural Networks with Global Convergence Guarantees
Zhengdao Chen
Eric Vanden-Eijnden
Joan Bruna
MLT
36
13
0
22 Apr 2022
Convergence of gradient descent for deep neural networks
Convergence of gradient descent for deep neural networks
S. Chatterjee
ODL
21
20
0
30 Mar 2022
Architecture Matters in Continual Learning
Architecture Matters in Continual Learning
Seyed Iman Mirzadeh
Arslan Chaudhry
Dong Yin
Timothy Nguyen
Razvan Pascanu
Dilan Görür
Mehrdad Farajtabar
OOD
KELM
116
58
0
01 Feb 2022
Implicit Regularization in Hierarchical Tensor Factorization and Deep
  Convolutional Neural Networks
Implicit Regularization in Hierarchical Tensor Factorization and Deep Convolutional Neural Networks
Noam Razin
Asaf Maman
Nadav Cohen
46
29
0
27 Jan 2022
The loss landscape of deep linear neural networks: a second-order
  analysis
The loss landscape of deep linear neural networks: a second-order analysis
E. M. Achour
Franccois Malgouyres
Sébastien Gerchinovitz
ODL
24
9
0
28 Jul 2021
Small random initialization is akin to spectral learning: Optimization
  and generalization guarantees for overparameterized low-rank matrix
  reconstruction
Small random initialization is akin to spectral learning: Optimization and generalization guarantees for overparameterized low-rank matrix reconstruction
Dominik Stöger
Mahdi Soltanolkotabi
ODL
42
75
0
28 Jun 2021
Understanding self-supervised Learning Dynamics without Contrastive
  Pairs
Understanding self-supervised Learning Dynamics without Contrastive Pairs
Yuandong Tian
Xinlei Chen
Surya Ganguli
SSL
138
281
0
12 Feb 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural
  Networks
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Asaf Noy
Yi Tian Xu
Y. Aflalo
Lihi Zelnik-Manor
R. L. Jin
31
3
0
12 Jan 2021
On the linearity of large non-linear models: when and why the tangent
  kernel is constant
On the linearity of large non-linear models: when and why the tangent kernel is constant
Chaoyue Liu
Libin Zhu
M. Belkin
21
140
0
02 Oct 2020
Deep matrix factorizations
Deep matrix factorizations
Pierre De Handschutter
Nicolas Gillis
Xavier Siebert
BDL
28
40
0
01 Oct 2020
Towards a Mathematical Understanding of Neural Network-Based Machine
  Learning: what we know and what we don't
Towards a Mathematical Understanding of Neural Network-Based Machine Learning: what we know and what we don't
E. Weinan
Chao Ma
Stephan Wojtowytsch
Lei Wu
AI4CE
22
133
0
22 Sep 2020
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Implicit Regularization in Deep Learning May Not Be Explainable by Norms
Noam Razin
Nadav Cohen
24
155
0
13 May 2020
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable
  Optimization Via Overparameterization From Depth
A Mean-field Analysis of Deep ResNet and Beyond: Towards Provable Optimization Via Overparameterization From Depth
Yiping Lu
Chao Ma
Yulong Lu
Jianfeng Lu
Lexing Ying
MLT
39
78
0
11 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear
  systems and neural networks
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
4
247
0
29 Feb 2020
Optimization for deep learning: theory and algorithms
Optimization for deep learning: theory and algorithms
Ruoyu Sun
ODL
14
168
0
19 Dec 2019
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Global Convergence of Gradient Descent for Deep Linear Residual Networks
Lei Wu
Qingcan Wang
Chao Ma
ODL
AI4CE
25
22
0
02 Nov 2019
Implicit Regularization in Deep Matrix Factorization
Implicit Regularization in Deep Matrix Factorization
Sanjeev Arora
Nadav Cohen
Wei Hu
Yuping Luo
AI4CE
24
491
0
31 May 2019
Analysis of the Gradient Descent Algorithm for a Deep Neural Network
  Model with Skip-connections
Analysis of the Gradient Descent Algorithm for a Deep Neural Network Model with Skip-connections
E. Weinan
Chao Ma
Qingcan Wang
Lei Wu
MLT
27
22
0
10 Apr 2019
Every Local Minimum Value is the Global Minimum Value of Induced Model
  in Non-convex Machine Learning
Every Local Minimum Value is the Global Minimum Value of Induced Model in Non-convex Machine Learning
Kenji Kawaguchi
Jiaoyang Huang
L. Kaelbling
AAML
16
18
0
07 Apr 2019
Width Provably Matters in Optimization for Deep Linear Neural Networks
Width Provably Matters in Optimization for Deep Linear Neural Networks
S. Du
Wei Hu
16
93
0
24 Jan 2019
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU
  Networks
Stochastic Gradient Descent Optimizes Over-parameterized Deep ReLU Networks
Difan Zou
Yuan Cao
Dongruo Zhou
Quanquan Gu
ODL
22
446
0
21 Nov 2018
On the Convergence Rate of Training Recurrent Neural Networks
On the Convergence Rate of Training Recurrent Neural Networks
Zeyuan Allen-Zhu
Yuanzhi Li
Zhao-quan Song
18
191
0
29 Oct 2018
A Convergence Analysis of Gradient Descent for Deep Linear Neural
  Networks
A Convergence Analysis of Gradient Descent for Deep Linear Neural Networks
Sanjeev Arora
Nadav Cohen
Noah Golowich
Wei Hu
18
281
0
04 Oct 2018
Exponential Convergence Time of Gradient Descent for One-Dimensional
  Deep Linear Neural Networks
Exponential Convergence Time of Gradient Descent for One-Dimensional Deep Linear Neural Networks
Ohad Shamir
32
45
0
23 Sep 2018
Representing smooth functions as compositions of near-identity functions
  with implications for deep network optimization
Representing smooth functions as compositions of near-identity functions with implications for deep network optimization
Peter L. Bartlett
S. Evans
Philip M. Long
73
31
0
13 Apr 2018
1