Batch normalization is one of the most important regularization techniques for neural networks, significantly improving training by centering the layers of the neural network. There have been several attempts to provide a theoretical justification for batch ormalization. Santurkar and Tsipras (2018) [How does batch normalization help optimization? Advances in neural information rocessing systems, 31] claim that batch normalization improves initialization. We provide a counterexample showing that this claim s not true, i.e., batch normalization does not improve initialization.
View on arXiv@article{dannemann2025_2502.17913, title={ Batch normalization does not improve initialization }, author={ Joris Dannemann and Gero Junike }, journal={arXiv preprint arXiv:2502.17913}, year={ 2025 } }