ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1806.05393
  4. Cited By
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks

Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

14 June 2018
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
ArXivPDFHTML

Papers citing "Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks"

50 / 77 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
0
0
02 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
56
0
0
22 Apr 2025
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom
Sangyoon Lee
Jaeho Lee
58
2
0
07 Oct 2024
Parseval Convolution Operators and Neural Networks
Parseval Convolution Operators and Neural Networks
Michael Unser
Stanislas Ducotterd
25
3
0
19 Aug 2024
Equivariant Neural Tangent Kernels
Equivariant Neural Tangent Kernels
Philipp Misof
Pan Kessel
Jan E. Gerken
64
0
0
10 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
39
3
0
29 May 2024
On the Neural Tangent Kernel of Equilibrium Models
On the Neural Tangent Kernel of Equilibrium Models
Zhili Feng
J. Zico Kolter
18
6
0
21 Oct 2023
Dynamical Isometry based Rigorous Fair Neural Architecture Search
Dynamical Isometry based Rigorous Fair Neural Architecture Search
Jianxiang Luo
Junyi Hu
Tianji Pang
Weihao Huang
Chuan-Hsi Liu
21
0
0
05 Jul 2023
Spike-driven Transformer
Spike-driven Transformer
Man Yao
Jiakui Hu
Zhaokun Zhou
Liuliang Yuan
Yonghong Tian
Boxing Xu
Guoqi Li
34
114
0
04 Jul 2023
Unraveling Projection Heads in Contrastive Learning: Insights from
  Expansion and Shrinkage
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage
Yu Gui
Cong Ma
Yiqiao Zhong
22
6
0
06 Jun 2023
Robust low-rank training via approximate orthonormal constraints
Robust low-rank training via approximate orthonormal constraints
Dayana Savostianova
Emanuele Zangrando
Gianluca Ceruti
Francesco Tudisco
24
9
0
02 Jun 2023
TIPS: Topologically Important Path Sampling for Anytime Neural Networks
TIPS: Topologically Important Path Sampling for Anytime Neural Networks
Guihong Li
Kartikeya Bhardwaj
Yuedong Yang
R. Marculescu
AAML
38
0
0
13 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks
Eshaan Nichani
Alexandru Damian
Jason D. Lee
MLT
44
13
0
11 May 2023
Criticality versus uniformity in deep neural networks
Criticality versus uniformity in deep neural networks
A. Bukva
Jurriaan de Gier
Kevin T. Grosvenor
R. Jefferson
K. Schalm
Eliot Schwander
31
3
0
10 Apr 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
On the Initialisation of Wide Low-Rank Feedforward Neural Networks
Thiziri Nait Saada
Jared Tanner
13
1
0
31 Jan 2023
Why is the State of Neural Network Pruning so Confusing? On the
  Fairness, Comparison Setup, and Trainability in Network Pruning
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning
Huan Wang
Can Qin
Yue Bai
Yun Fu
34
20
0
12 Jan 2023
Orthogonal SVD Covariance Conditioning and Latent Disentanglement
Orthogonal SVD Covariance Conditioning and Latent Disentanglement
Yue Song
N. Sebe
Wei Wang
26
6
0
11 Dec 2022
Statistical Physics of Deep Neural Networks: Initialization toward
  Optimal Channels
Statistical Physics of Deep Neural Networks: Initialization toward Optimal Channels
Kangyu Weng
Aohua Cheng
Ziyang Zhang
Pei Sun
Yang Tian
50
2
0
04 Dec 2022
Improved techniques for deterministic l2 robustness
Improved techniques for deterministic l2 robustness
Sahil Singla
S. Feizi
AAML
23
9
0
15 Nov 2022
Proximal Mean Field Learning in Shallow Neural Networks
Proximal Mean Field Learning in Shallow Neural Networks
Alexis M. H. Teter
Iman Nodozi
A. Halder
FedML
43
1
0
25 Oct 2022
Component-Wise Natural Gradient Descent -- An Efficient Neural Network
  Optimization
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization
Tran van Sang
Mhd Irvan
R. Yamaguchi
Toshiyuki Nakata
15
1
0
11 Oct 2022
On skip connections and normalisation layers in deep optimisation
On skip connections and normalisation layers in deep optimisation
L. MacDonald
Jack Valmadre
Hemanth Saratchandran
Simon Lucey
ODL
19
1
0
10 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
40
2
0
05 Oct 2022
Dynamical systems' based neural networks
Dynamical systems' based neural networks
E. Celledoni
Davide Murari
B. Owren
Carola-Bibiane Schönlieb
Ferdia Sherry
OOD
43
10
0
05 Oct 2022
Random orthogonal additive filters: a solution to the
  vanishing/exploding gradient of deep neural networks
Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks
Andrea Ceni
ODL
23
3
0
03 Oct 2022
Neural Networks Reduction via Lumping
Neural Networks Reduction via Lumping
Dalila Ressi
Riccardo Romanello
S. Rossi
Carla Piazza
35
4
0
15 Sep 2022
Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality
Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality
Yue Song
N. Sebe
Wei Wang
19
8
0
05 Jul 2022
AutoInit: Automatic Initialization via Jacobian Tuning
AutoInit: Automatic Initialization via Jacobian Tuning
Tianyu He
Darshil Doshi
Andrey Gromov
11
4
0
27 Jun 2022
Fast Finite Width Neural Tangent Kernel
Fast Finite Width Neural Tangent Kernel
Roman Novak
Jascha Narain Sohl-Dickstein
S. Schoenholz
AAML
22
53
0
17 Jun 2022
Feedback Gradient Descent: Efficient and Stable Optimization with
  Orthogonality for DNNs
Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs
Fanchen Bu
D. Chang
28
6
0
12 May 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
23
26
0
15 Mar 2022
projUNN: efficient method for training deep networks with unitary
  matrices
projUNN: efficient method for training deep networks with unitary matrices
B. Kiani
Randall Balestriero
Yann LeCun
S. Lloyd
43
32
0
10 Mar 2022
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
Ido Nachum
Jan Hkazla
Michael C. Gastpar
Anatoly Khina
36
0
0
03 Nov 2021
RMNet: Equivalently Removing Residual Connection from Networks
RMNet: Equivalently Removing Residual Connection from Networks
Fanxu Meng
Hao Cheng
Jia-Xin Zhuang
Ke Li
Xing Sun
23
11
0
01 Nov 2021
Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest
  Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz
  Functions
Ridgeless Interpolation with Shallow ReLU Networks in 1D1D1D is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions
Boris Hanin
MLT
38
9
0
27 Sep 2021
Orthogonal Graph Neural Networks
Orthogonal Graph Neural Networks
Kai Guo
Kaixiong Zhou
Xia Hu
Yu Li
Yi Chang
Xin Wang
43
34
0
23 Sep 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Existence, Stability and Scalability of Orthogonal Convolutional Neural
  Networks
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks
E. M. Achour
Franccois Malgouyres
Franck Mamalet
16
20
0
12 Aug 2021
Towards quantifying information flows: relative entropy in deep neural
  networks and the renormalization group
Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group
J. Erdmenger
Kevin T. Grosvenor
R. Jefferson
54
17
0
14 Jul 2021
Marginalizable Density Models
Marginalizable Density Models
D. Gilboa
Ari Pakman
Thibault Vatter
BDL
32
5
0
08 Jun 2021
A Geometric Analysis of Neural Collapse with Unconstrained Features
A Geometric Analysis of Neural Collapse with Unconstrained Features
Zhihui Zhu
Tianyu Ding
Jinxin Zhou
Xiao Li
Chong You
Jeremias Sulam
Qing Qu
27
194
0
06 May 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
27
986
0
31 Mar 2021
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of
  Multilayer Perceptron: The Haar Orthogonal Case
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case
B. Collins
Tomohiro Hayase
22
7
0
24 Mar 2021
RepVGG: Making VGG-style ConvNets Great Again
RepVGG: Making VGG-style ConvNets Great Again
Xiaohan Ding
Xinming Zhang
Ningning Ma
Jungong Han
Guiguang Ding
Jian Sun
136
1,548
0
11 Jan 2021
Advances in Electron Microscopy with Deep Learning
Advances in Electron Microscopy with Deep Learning
Jeffrey M. Ede
32
2
0
04 Jan 2021
StackRec: Efficient Training of Very Deep Sequential Recommender Models
  by Iterative Stacking
StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking
Jiachun Wang
Fajie Yuan
Jian Chen
Qingyao Wu
Min Yang
Yang Sun
Guoxiao Zhang
BDL
40
26
0
14 Dec 2020
BYOL works even without batch statistics
BYOL works even without batch statistics
Pierre Harvey Richemond
Jean-Bastien Grill
Florent Altché
Corentin Tallec
Florian Strub
...
Samuel L. Smith
Soham De
Razvan Pascanu
Bilal Piot
Michal Valko
SSL
250
114
0
20 Oct 2020
Tensor Programs III: Neural Matrix Laws
Tensor Programs III: Neural Matrix Laws
Greg Yang
11
43
0
22 Sep 2020
Review: Deep Learning in Electron Microscopy
Review: Deep Learning in Electron Microscopy
Jeffrey M. Ede
34
79
0
17 Sep 2020
Whitening and second order optimization both make information in the
  dataset unusable during training, and can reduce or prevent generalization
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization
Neha S. Wadia
Daniel Duckworth
S. Schoenholz
Ethan Dyer
Jascha Narain Sohl-Dickstein
27
13
0
17 Aug 2020
12
Next