ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.07810
17
7

Depth Dependence of μμμP Learning Rates in ReLU MLPs

13 May 2023
Samy Jelassi
Boris Hanin
Ziwei Ji
Sashank J. Reddi
Srinadh Bhojanapalli
Surinder Kumar
ArXivPDFHTML
Abstract

In this short note we consider random fully connected ReLU networks of width nnn and depth LLL equipped with a mean-field weight initialization. Our purpose is to study the dependence on nnn and LLL of the maximal update (μ\muμP) learning rate, the largest learning rate for which the mean squared change in pre-activations after one step of gradient descent remains uniformly bounded at large n,Ln,Ln,L. As in prior work on μ\muμP of Yang et. al., we find that this maximal update learning rate is independent of nnn for all but the first and last layer weights. However, we find that it has a non-trivial dependence of LLL, scaling like L−3/2.L^{-3/2}.L−3/2.

View on arXiv
Comments on this paper