The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

7 June 2019

Papers citing "The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks"

29 / 29 papers shown

Title
Parallel Layer Normalization for Universal Approximation Yunhao Ni Yuhe Liu Wenxin Sun Yitong Tang Yuxin Guo Peilin Feng Wenjun Wu Lei Huang 20 0 0 19 May 2025
Non-identifiability distinguishes Neural Networks among Parametric Models Sourav Chatterjee Timothy Sudijono 40 0 0 25 Apr 2025
Transformers without Normalization Jiachen Zhu Xinlei Chen Kaiming He Yann LeCun Zhuang Liu ViT OffRL 84 8 0 13 Mar 2025
Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks Zhicheng Cai Hao Zhu Qiu Shen Xinran Wang Xun Cao 78 0 0 25 Jul 2024
On the Nonlinearity of Layer Normalization Yunhao Ni Yuxin Guo Junlong Jia Lei Huang 52 5 0 03 Jun 2024
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization Yao Ni Piotr Koniusz AI4CE GAN 45 1 0 31 Mar 2024
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method Mohannad Elhamod Anuj Karpatne 47 1 0 26 Sep 2023
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization Tran van Sang Mhd Irvan R. Yamaguchi Toshiyuki Nakata 26 1 0 11 Oct 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability Z. Li Zixuan Wang Jian Li 31 44 0 26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction Kaifeng Lyu Zhiyuan Li Sanjeev Arora FAtt 54 71 0 14 Jun 2022
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules Yuhan Helena Liu Arna Ghosh Blake A. Richards E. Shea-Brown Guillaume Lajoie 53 9 0 02 Jun 2022
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models A. Engel Zhichao Wang Anand D. Sarwate Sutanay Choudhury Tony Chiang 47 3 0 24 May 2022
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning Ekdeep Singh Lubana Robert P. Dick Hidenori Tanaka 38 35 0 10 Jun 2021
Batch Normalization Orthogonalizes Representations in Deep Random Networks Hadi Daneshmand Amir Joudaki Francis R. Bach OOD 17 37 0 07 Jun 2021
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case B. Collins Tomohiro Hayase 33 7 0 24 Mar 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks Jungmin Kwon Jeongseop Kim Hyunseong Park I. Choi 53 287 0 23 Feb 2021
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks Yikai Wu Xingyu Zhu Chenwei Wu Annie Wang Rong Ge 35 43 0 08 Oct 2020
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks Ryo Karakida Kazuki Osawa 27 26 0 02 Oct 2020
Group Whitening: Balancing Learning Efficiency and Representational Capacity Lei Huang Yi Zhou Li Liu Fan Zhu Ling Shao 38 21 0 28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and Application Lei Huang Jie Qin Yi Zhou Fan Zhu Li Liu Ling Shao AI4CE 32 258 0 27 Sep 2020
Spherical Perspective on Learning with Normalization Layers Simon Roburin Yann de Mont-Marin Andrei Bursuc Renaud Marlet P. Pérez Mathieu Aubry 16 6 0 23 Jun 2020
When Does Preconditioning Help or Hurt Generalization? S. Amari Jimmy Ba Roger C. Grosse Xuechen Li Atsushi Nitanda Taiji Suzuki Denny Wu Ji Xu 41 32 0 18 Jun 2020
The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry Tomohiro Hayase Ryo Karakida 34 7 0 14 Jun 2020
Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks Hadi Daneshmand Jonas Köhler Francis R. Bach Thomas Hofmann Aurelien Lucchi OOD ODL 10 4 0 03 Mar 2020
Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective S. Amari 32 12 0 20 Jan 2020
Pathological spectra of the Fisher information metric and its variants in deep neural networks Ryo Karakida S. Akaho S. Amari 33 28 0 14 Oct 2019
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks Lechao Xiao Yasaman Bahri Jascha Narain Sohl-Dickstein S. Schoenholz Jeffrey Pennington 250 350 0 14 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach Ryo Karakida S. Akaho S. Amari FedML 54 141 0 04 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 318 2,904 0 15 Sep 2016