ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.02926
  4. Cited By
The Normalization Method for Alleviating Pathological Sharpness in Wide
  Neural Networks

The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks

7 June 2019
Ryo Karakida
S. Akaho
S. Amari
ArXivPDFHTML

Papers citing "The Normalization Method for Alleviating Pathological Sharpness in Wide Neural Networks"

29 / 29 papers shown
Title
Parallel Layer Normalization for Universal Approximation
Parallel Layer Normalization for Universal Approximation
Yunhao Ni
Yuhe Liu
Wenxin Sun
Yitong Tang
Yuxin Guo
Peilin Feng
Wenjun Wu
Lei Huang
20
0
0
19 May 2025
Non-identifiability distinguishes Neural Networks among Parametric Models
Non-identifiability distinguishes Neural Networks among Parametric Models
Sourav Chatterjee
Timothy Sudijono
40
0
0
25 Apr 2025
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
ViT
OffRL
84
8
0
13 Mar 2025
Towards the Spectral bias Alleviation by Normalizations in Coordinate
  Networks
Towards the Spectral bias Alleviation by Normalizations in Coordinate Networks
Zhicheng Cai
Hao Zhu
Qiu Shen
Xinran Wang
Xun Cao
78
0
0
25 Jul 2024
On the Nonlinearity of Layer Normalization
On the Nonlinearity of Layer Normalization
Yunhao Ni
Yuxin Guo
Junlong Jia
Lei Huang
52
5
0
03 Jun 2024
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz
  continuity constrAIned Normalization
CHAIN: Enhancing Generalization in Data-Efficient GANs via lipsCHitz continuity constrAIned Normalization
Yao Ni
Piotr Koniusz
AI4CE
GAN
45
1
0
31 Mar 2024
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization
  Method
Neuro-Visualizer: An Auto-encoder-based Loss Landscape Visualization Method
Mohannad Elhamod
Anuj Karpatne
47
1
0
26 Sep 2023
Component-Wise Natural Gradient Descent -- An Efficient Neural Network
  Optimization
Component-Wise Natural Gradient Descent -- An Efficient Neural Network Optimization
Tran van Sang
Mhd Irvan
R. Yamaguchi
Toshiyuki Nakata
26
1
0
11 Oct 2022
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge
  of Stability
Analyzing Sharpness along GD Trajectory: Progressive Sharpening and Edge of Stability
Z. Li
Zixuan Wang
Jian Li
31
44
0
26 Jul 2022
Understanding the Generalization Benefit of Normalization Layers:
  Sharpness Reduction
Understanding the Generalization Benefit of Normalization Layers: Sharpness Reduction
Kaifeng Lyu
Zhiyuan Li
Sanjeev Arora
FAtt
54
71
0
14 Jun 2022
Beyond accuracy: generalization properties of bio-plausible temporal
  credit assignment rules
Beyond accuracy: generalization properties of bio-plausible temporal credit assignment rules
Yuhan Helena Liu
Arna Ghosh
Blake A. Richards
E. Shea-Brown
Guillaume Lajoie
53
9
0
02 Jun 2022
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch
  Models
TorchNTK: A Library for Calculation of Neural Tangent Kernels of PyTorch Models
A. Engel
Zhichao Wang
Anand D. Sarwate
Sutanay Choudhury
Tony Chiang
47
3
0
24 May 2022
Beyond BatchNorm: Towards a Unified Understanding of Normalization in
  Deep Learning
Beyond BatchNorm: Towards a Unified Understanding of Normalization in Deep Learning
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
38
35
0
10 Jun 2021
Batch Normalization Orthogonalizes Representations in Deep Random
  Networks
Batch Normalization Orthogonalizes Representations in Deep Random Networks
Hadi Daneshmand
Amir Joudaki
Francis R. Bach
OOD
17
37
0
07 Jun 2021
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of
  Multilayer Perceptron: The Haar Orthogonal Case
Asymptotic Freeness of Layerwise Jacobians Caused by Invariance of Multilayer Perceptron: The Haar Orthogonal Case
B. Collins
Tomohiro Hayase
33
7
0
24 Mar 2021
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning
  of Deep Neural Networks
ASAM: Adaptive Sharpness-Aware Minimization for Scale-Invariant Learning of Deep Neural Networks
Jungmin Kwon
Jeongseop Kim
Hyunseong Park
I. Choi
53
287
0
23 Feb 2021
Dissecting Hessian: Understanding Common Structure of Hessian in Neural
  Networks
Dissecting Hessian: Understanding Common Structure of Hessian in Neural Networks
Yikai Wu
Xingyu Zhu
Chenwei Wu
Annie Wang
Rong Ge
35
43
0
08 Oct 2020
Understanding Approximate Fisher Information for Fast Convergence of
  Natural Gradient Descent in Wide Neural Networks
Understanding Approximate Fisher Information for Fast Convergence of Natural Gradient Descent in Wide Neural Networks
Ryo Karakida
Kazuki Osawa
27
26
0
02 Oct 2020
Group Whitening: Balancing Learning Efficiency and Representational
  Capacity
Group Whitening: Balancing Learning Efficiency and Representational Capacity
Lei Huang
Yi Zhou
Li Liu
Fan Zhu
Ling Shao
38
21
0
28 Sep 2020
Normalization Techniques in Training DNNs: Methodology, Analysis and
  Application
Normalization Techniques in Training DNNs: Methodology, Analysis and Application
Lei Huang
Jie Qin
Yi Zhou
Fan Zhu
Li Liu
Ling Shao
AI4CE
32
258
0
27 Sep 2020
Spherical Perspective on Learning with Normalization Layers
Spherical Perspective on Learning with Normalization Layers
Simon Roburin
Yann de Mont-Marin
Andrei Bursuc
Renaud Marlet
P. Pérez
Mathieu Aubry
16
6
0
23 Jun 2020
When Does Preconditioning Help or Hurt Generalization?
When Does Preconditioning Help or Hurt Generalization?
S. Amari
Jimmy Ba
Roger C. Grosse
Xuechen Li
Atsushi Nitanda
Taiji Suzuki
Denny Wu
Ji Xu
41
32
0
18 Jun 2020
The Spectrum of Fisher Information of Deep Networks Achieving Dynamical
  Isometry
The Spectrum of Fisher Information of Deep Networks Achieving Dynamical Isometry
Tomohiro Hayase
Ryo Karakida
34
7
0
14 Jun 2020
Batch Normalization Provably Avoids Rank Collapse for Randomly
  Initialised Deep Networks
Batch Normalization Provably Avoids Rank Collapse for Randomly Initialised Deep Networks
Hadi Daneshmand
Jonas Köhler
Francis R. Bach
Thomas Hofmann
Aurelien Lucchi
OOD
ODL
10
4
0
03 Mar 2020
Any Target Function Exists in a Neighborhood of Any Sufficiently Wide
  Random Network: A Geometrical Perspective
Any Target Function Exists in a Neighborhood of Any Sufficiently Wide Random Network: A Geometrical Perspective
S. Amari
32
12
0
20 Jan 2020
Pathological spectra of the Fisher information metric and its variants
  in deep neural networks
Pathological spectra of the Fisher information metric and its variants in deep neural networks
Ryo Karakida
S. Akaho
S. Amari
33
28
0
14 Oct 2019
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train
  10,000-Layer Vanilla Convolutional Neural Networks
Dynamical Isometry and a Mean Field Theory of CNNs: How to Train 10,000-Layer Vanilla Convolutional Neural Networks
Lechao Xiao
Yasaman Bahri
Jascha Narain Sohl-Dickstein
S. Schoenholz
Jeffrey Pennington
250
350
0
14 Jun 2018
Universal Statistics of Fisher Information in Deep Neural Networks: Mean
  Field Approach
Universal Statistics of Fisher Information in Deep Neural Networks: Mean Field Approach
Ryo Karakida
S. Akaho
S. Amari
FedML
54
141
0
04 Jun 2018
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
318
2,904
0
15 Sep 2016
1