ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1812.03981
26
130

Theoretical Analysis of Auto Rate-Tuning by Batch Normalization

10 December 2018
Sanjeev Arora
Zhiyuan Li
Kaifeng Lyu
ArXivPDFHTML
Abstract

Batch Normalization (BN) has become a cornerstone of deep learning across diverse architectures, appearing to help optimization as well as generalization. While the idea makes intuitive sense, theoretical analysis of its effectiveness has been lacking. Here theoretical support is provided for one of its conjectured properties, namely, the ability to allow gradient descent to succeed with less tuning of learning rates. It is shown that even if we fix the learning rate of scale-invariant parameters (e.g., weights of each layer with BN) to a constant (say, 0.30.30.3), gradient descent still approaches a stationary point (i.e., a solution where gradient is zero) in the rate of T−1/2T^{-1/2}T−1/2 in TTT iterations, asymptotically matching the best bound for gradient descent with well-tuned learning rates. A similar result with convergence rate T−1/4T^{-1/4}T−1/4 is also shown for stochastic gradient descent.

View on arXiv
Comments on this paper