ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2406.02157
45
1

Online Learning and Information Exponents: On The Importance of Batch size, and Time/Complexity Tradeoffs

4 June 2024
Luca Arnaboldi
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
ArXivPDFHTML
Abstract

We study the impact of the batch size nbn_bnb​ on the iteration time TTT of training two-layer neural networks with one-pass stochastic gradient descent (SGD) on multi-index target functions of isotropic covariates. We characterize the optimal batch size minimizing the iteration time as a function of the hardness of the target, as characterized by the information exponents. We show that performing gradient updates with large batches nb≲dℓ2n_b \lesssim d^{\frac{\ell}{2}}nb​≲d2ℓ​ minimizes the training time without changing the total sample complexity, where ℓ\ellℓ is the information exponent of the target to be learned \citep{arous2021online} and ddd is the input dimension. However, larger batch sizes than nb≫dℓ2n_b \gg d^{\frac{\ell}{2}}nb​≫d2ℓ​ are detrimental for improving the time complexity of SGD. We provably overcome this fundamental limitation via a different training protocol, \textit{Correlation loss SGD}, which suppresses the auto-correlation terms in the loss function. We show that one can track the training progress by a system of low-dimensional ordinary differential equations (ODEs). Finally, we validate our theoretical results with numerical experiments.

View on arXiv
Comments on this paper