Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

14 June 2018

Lechao Xiao

Yasaman Bahri

Jascha Narain Sohl-Dickstein

S. Schoenholz

Jeffrey Pennington

ArXiv PDF HTML

Papers citing "Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks"

50 / 70 papers shown

Title
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Bill Li Blake Bordelon Shane Bergsma C. Pehlevan Boris Hanin Joel Hestness 39 0 0 02 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer Soham Sane ODL 56 0 0 22 Apr 2025
Fast Training of Sinusoidal Neural Fields via Scaling Initialization Taesun Yeom Sangyoon Lee Jaeho Lee 55 2 0 07 Oct 2024
Parseval Convolution Operators and Neural Networks Michael Unser Stanislas Ducotterd 25 3 0 19 Aug 2024
Equivariant Neural Tangent Kernels Philipp Misof Pan Kessel Jan E. Gerken 61 0 0 10 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training Bobby He Lorenzo Noci Daniele Paliotta Imanol Schlag Thomas Hofmann 36 3 0 29 May 2024
On the Neural Tangent Kernel of Equilibrium Models Zhili Feng J. Zico Kolter 18 6 0 21 Oct 2023
Dynamical Isometry based Rigorous Fair Neural Architecture Search Jianxiang Luo Junyi Hu Tianji Pang Weihao Huang Chuan-Hsi Liu 21 0 0 05 Jul 2023
Spike-driven Transformer Man Yao Jiakui Hu Zhaokun Zhou Liuliang Yuan Yonghong Tian Boxing Xu Guoqi Li 34 114 0 04 Jul 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage Yu Gui Cong Ma Yiqiao Zhong 22 6 0 06 Jun 2023
Robust low-rank training via approximate orthonormal constraints Dayana Savostianova Emanuele Zangrando Gianluca Ceruti Francesco Tudisco 24 9 0 02 Jun 2023
TIPS: Topologically Important Path Sampling for Anytime Neural Networks Guihong Li Kartikeya Bhardwaj Yuedong Yang R. Marculescu AAML 36 0 0 13 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks Eshaan Nichani Alexandru Damian Jason D. Lee MLT 38 13 0 11 May 2023
Criticality versus uniformity in deep neural networks A. Bukva Jurriaan de Gier Kevin T. Grosvenor R. Jefferson K. Schalm Eliot Schwander 28 3 0 10 Apr 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks Thiziri Nait Saada Jared Tanner 13 1 0 31 Jan 2023
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning Huan Wang Can Qin Yue Bai Yun Fu 32 20 0 12 Jan 2023
Orthogonal SVD Covariance Conditioning and Latent Disentanglement Yue Song N. Sebe Wei Wang 26 6 0 11 Dec 2022
Statistical Physics of Deep Neural Networks: Initialization toward Optimal Channels Kangyu Weng Aohua Cheng Ziyang Zhang Pei Sun Yang Tian 48 2 0 04 Dec 2022
Improved techniques for deterministic l2 robustness Sahil Singla S. Feizi AAML 23 9 0 15 Nov 2022
Proximal Mean Field Learning in Shallow Neural Networks Alexis M. H. Teter Iman Nodozi A. Halder FedML 43 1 0 25 Oct 2022
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization Tran van Sang Mhd Irvan R. Yamaguchi Toshiyuki Nakata 13 1 0 11 Oct 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 19 1 0 10 Oct 2022
Dynamical Isometry for Residual Networks Advait Gadhikar R. Burkholz ODL AI4CE 40 2 0 05 Oct 2022
Dynamical systems' based neural networks E. Celledoni Davide Murari B. Owren Carola-Bibiane Schönlieb Ferdia Sherry OOD 40 10 0 05 Oct 2022
Neural Networks Reduction via Lumping Dalila Ressi Riccardo Romanello S. Rossi Carla Piazza 30 4 0 15 Sep 2022
Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality Yue Song N. Sebe Wei Wang 16 8 0 05 Jul 2022
Fast Finite Width Neural Tangent Kernel Roman Novak Jascha Narain Sohl-Dickstein S. Schoenholz AAML 20 53 0 17 Jun 2022
Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs Fanchen Bu D. Chang 28 6 0 12 May 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers Guodong Zhang Aleksandar Botev James Martens OffRL 21 26 0 15 Mar 2022
projUNN: efficient method for training deep networks with unitary matrices B. Kiani Randall Balestriero Yann LeCun S. Lloyd 41 32 0 10 Mar 2022
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs Ido Nachum Jan Hkazla Michael C. Gastpar Anatoly Khina 33 0 0 03 Nov 2021
Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions Boris Hanin MLT 38 9 0 27 Sep 2021
Orthogonal Graph Neural Networks Kai Guo Kaixiong Zhou Xia Hu Yu Li Yi Chang Xin Wang 43 34 0 23 Sep 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks G. Bingham Risto Miikkulainen ODL 24 4 0 18 Sep 2021
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks E. M. Achour Franccois Malgouyres Franck Mamalet 16 20 0 12 Aug 2021
Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group J. Erdmenger Kevin T. Grosvenor R. Jefferson 54 17 0 14 Jul 2021
Marginalizable Density Models D. Gilboa Ari Pakman Thibault Vatter BDL 32 5 0 08 Jun 2021
Going deeper with Image Transformers Hugo Touvron Matthieu Cord Alexandre Sablayrolles Gabriel Synnaeve Hervé Jégou ViT 27 986 0 31 Mar 2021
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case B. Collins Tomohiro Hayase 22 7 0 24 Mar 2021
RepVGG: Making VGG-style ConvNets Great Again Xiaohan Ding Xinming Zhang Ningning Ma Jungong Han Guiguang Ding Jian Sun 136 1,548 0 11 Jan 2021
Advances in Electron Microscopy with Deep Learning Jeffrey M. Ede 32 2 0 04 Jan 2021
StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking Jiachun Wang Fajie Yuan Jian Chen Qingyao Wu Min Yang Yang Sun Guoxiao Zhang BDL 40 26 0 14 Dec 2020
BYOL works even without batch statistics Pierre Harvey Richemond Jean-Bastien Grill Florent Altché Corentin Tallec Florian Strub ... Samuel L. Smith Soham De Razvan Pascanu Bilal Piot Michal Valko SSL 250 114 0 20 Oct 2020
Review: Deep Learning in Electron Microscopy Jeffrey M. Ede 31 79 0 17 Sep 2020
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization Neha S. Wadia Daniel Duckworth S. Schoenholz Ethan Dyer Jascha Narain Sohl-Dickstein 27 13 0 17 Aug 2020
Beyond Signal Propagation: Is Feature Diversity Necessary in Deep Neural Network Initialization? Yaniv Blumenfeld D. Gilboa Daniel Soudry ODL 22 13 0 02 Jul 2020
Deep Isometric Learning for Visual Recognition Haozhi Qi Chong You Xinyu Wang Yi Ma Jitendra Malik VLM 30 53 0 30 Jun 2020
Tensor Programs II: Neural Tangent Kernel for Any Architecture Greg Yang 48 134 0 25 Jun 2020
The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry Tomohiro Hayase Ryo Karakida 27 7 0 14 Jun 2020
Batch Normalization Biases Residual Blocks Towards the Identity Function in Deep Networks Soham De Samuel L. Smith ODL 14 20 0 24 Feb 2020