Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks

14 June 2018

Lechao Xiao

Yasaman Bahri

Jascha Narain Sohl-Dickstein

S. Schoenholz

Jeffrey Pennington

ArXiv PDF HTML

Papers citing "Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks"

50 / 77 papers shown

Title
Don't be lazy: CompleteP enables compute-efficient deep transformers Nolan Dey Bin Claire Zhang Lorenzo Noci Mufan Li Blake Bordelon Shane Bergsma Cengiz Pehlevan Boris Hanin Joel Hestness 44 0 0 02 May 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer Soham Sane ODL 56 0 0 22 Apr 2025
Fast Training of Sinusoidal Neural Fields via Scaling Initialization Taesun Yeom Sangyoon Lee Jaeho Lee 58 2 0 07 Oct 2024
Parseval Convolution Operators and Neural Networks Michael Unser Stanislas Ducotterd 25 3 0 19 Aug 2024
Equivariant Neural Tangent Kernels Philipp Misof Pan Kessel Jan E. Gerken 64 0 0 10 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training Bobby He Lorenzo Noci Daniele Paliotta Imanol Schlag Thomas Hofmann 39 3 0 29 May 2024
On the Neural Tangent Kernel of Equilibrium Models Zhili Feng J. Zico Kolter 18 6 0 21 Oct 2023
Dynamical Isometry based Rigorous Fair Neural Architecture Search Jianxiang Luo Junyi Hu Tianji Pang Weihao Huang Chuan-Hsi Liu 21 0 0 05 Jul 2023
Spike-driven Transformer Man Yao Jiakui Hu Zhaokun Zhou Liuliang Yuan Yonghong Tian Boxing Xu Guoqi Li 34 114 0 04 Jul 2023
Unraveling Projection Heads in Contrastive Learning: Insights from Expansion and Shrinkage Yu Gui Cong Ma Yiqiao Zhong 22 6 0 06 Jun 2023
Robust low-rank training via approximate orthonormal constraints Dayana Savostianova Emanuele Zangrando Gianluca Ceruti Francesco Tudisco 24 9 0 02 Jun 2023
TIPS: Topologically Important Path Sampling for Anytime Neural Networks Guihong Li Kartikeya Bhardwaj Yuedong Yang R. Marculescu AAML 38 0 0 13 May 2023
Provable Guarantees for Nonlinear Feature Learning in Three-Layer Neural Networks Eshaan Nichani Alexandru Damian Jason D. Lee MLT 44 13 0 11 May 2023
Criticality versus uniformity in deep neural networks A. Bukva Jurriaan de Gier Kevin T. Grosvenor R. Jefferson K. Schalm Eliot Schwander 31 3 0 10 Apr 2023
On the Initialisation of Wide Low-Rank Feedforward Neural Networks Thiziri Nait Saada Jared Tanner 13 1 0 31 Jan 2023
Why is the State of Neural Network Pruning so Confusing? On the Fairness, Comparison Setup, and Trainability in Network Pruning Huan Wang Can Qin Yue Bai Yun Fu 34 20 0 12 Jan 2023
Orthogonal SVD Covariance Conditioning and Latent Disentanglement Yue Song N. Sebe Wei Wang 26 6 0 11 Dec 2022
Statistical Physics of Deep Neural Networks: Initialization toward Optimal Channels Kangyu Weng Aohua Cheng Ziyang Zhang Pei Sun Yang Tian 50 2 0 04 Dec 2022
Improved techniques for deterministic l2 robustness Sahil Singla S. Feizi AAML 23 9 0 15 Nov 2022
Proximal Mean Field Learning in Shallow Neural Networks Alexis M. H. Teter Iman Nodozi A. Halder FedML 43 1 0 25 Oct 2022
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization Tran van Sang Mhd Irvan R. Yamaguchi Toshiyuki Nakata 15 1 0 11 Oct 2022
On skip connections and normalisation layers in deep optimisation L. MacDonald Jack Valmadre Hemanth Saratchandran Simon Lucey ODL 19 1 0 10 Oct 2022
Dynamical Isometry for Residual Networks Advait Gadhikar R. Burkholz ODL AI4CE 40 2 0 05 Oct 2022
Dynamical systems' based neural networks E. Celledoni Davide Murari B. Owren Carola-Bibiane Schönlieb Ferdia Sherry OOD 43 10 0 05 Oct 2022
Random orthogonal additive filters: a solution to the vanishing/exploding gradient of deep neural networks Andrea Ceni ODL 23 3 0 03 Oct 2022
Neural Networks Reduction via Lumping Dalila Ressi Riccardo Romanello S. Rossi Carla Piazza 35 4 0 15 Sep 2022
Improving Covariance Conditioning of the SVD Meta-layer by Orthogonality Yue Song N. Sebe Wei Wang 19 8 0 05 Jul 2022
AutoInit: Automatic Initialization via Jacobian Tuning Tianyu He Darshil Doshi Andrey Gromov 11 4 0 27 Jun 2022
Fast Finite Width Neural Tangent Kernel Roman Novak Jascha Narain Sohl-Dickstein S. Schoenholz AAML 22 53 0 17 Jun 2022
Feedback Gradient Descent: Efficient and Stable Optimization with Orthogonality for DNNs Fanchen Bu D. Chang 28 6 0 12 May 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers Guodong Zhang Aleksandar Botev James Martens OffRL 23 26 0 15 Mar 2022
projUNN: efficient method for training deep networks with unitary matrices B. Kiani Randall Balestriero Yann LeCun S. Lloyd 43 32 0 10 Mar 2022
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs Ido Nachum Jan Hkazla Michael C. Gastpar Anatoly Khina 36 0 0 03 Nov 2021
RMNet: Equivalently Removing Residual Connection from Networks Fanxu Meng Hao Cheng Jia-Xin Zhuang Ke Li Xing Sun 23 11 0 01 Nov 2021
Ridgeless Interpolation with Shallow ReLU Networks in $1D$ is Nearest Neighbor Curvature Extrapolation and Provably Generalizes on Lipschitz Functions Boris Hanin MLT 38 9 0 27 Sep 2021
Orthogonal Graph Neural Networks Kai Guo Kaixiong Zhou Xia Hu Yu Li Yi Chang Xin Wang 43 34 0 23 Sep 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks G. Bingham Risto Miikkulainen ODL 24 4 0 18 Sep 2021
Existence, Stability and Scalability of Orthogonal Convolutional Neural Networks E. M. Achour Franccois Malgouyres Franck Mamalet 16 20 0 12 Aug 2021
Towards quantifying information flows: relative entropy in deep neural networks and the renormalization group J. Erdmenger Kevin T. Grosvenor R. Jefferson 54 17 0 14 Jul 2021
Marginalizable Density Models D. Gilboa Ari Pakman Thibault Vatter BDL 32 5 0 08 Jun 2021
A Geometric Analysis of Neural Collapse with Unconstrained Features Zhihui Zhu Tianyu Ding Jinxin Zhou Xiao Li Chong You Jeremias Sulam Qing Qu 27 194 0 06 May 2021
Going deeper with Image Transformers Hugo Touvron Matthieu Cord Alexandre Sablayrolles Gabriel Synnaeve Hervé Jégou ViT 27 986 0 31 Mar 2021
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case B. Collins Tomohiro Hayase 22 7 0 24 Mar 2021
RepVGG: Making VGG-style ConvNets Great Again Xiaohan Ding Xinming Zhang Ningning Ma Jungong Han Guiguang Ding Jian Sun 136 1,548 0 11 Jan 2021
Advances in Electron Microscopy with Deep Learning Jeffrey M. Ede 32 2 0 04 Jan 2021
StackRec: Efficient Training of Very Deep Sequential Recommender Models by Iterative Stacking Jiachun Wang Fajie Yuan Jian Chen Qingyao Wu Min Yang Yang Sun Guoxiao Zhang BDL 40 26 0 14 Dec 2020
BYOL works even without batch statistics Pierre Harvey Richemond Jean-Bastien Grill Florent Altché Corentin Tallec Florian Strub ... Samuel L. Smith Soham De Razvan Pascanu Bilal Piot Michal Valko SSL 250 114 0 20 Oct 2020
Tensor Programs III: Neural Matrix Laws Greg Yang 11 43 0 22 Sep 2020
Review: Deep Learning in Electron Microscopy Jeffrey M. Ede 34 79 0 17 Sep 2020
Whitening and second order optimization both make information in the dataset unusable during training, and can reduce or prevent generalization Neha S. Wadia Daniel Duckworth S. Schoenholz Ethan Dyer Jascha Narain Sohl-Dickstein 27 13 0 17 Aug 2020