ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.10802
18
52

A Neural Scaling Law from the Dimension of the Data Manifold

22 April 2020
Utkarsh Sharma
Jared Kaplan
ArXivPDFHTML
Abstract

When data is plentiful, the loss achieved by well-trained neural networks scales as a power-law L∝N−αL \propto N^{-\alpha}L∝N−α in the number of network parameters NNN. This empirical scaling law holds for a wide variety of data modalities, and may persist over many orders of magnitude. The scaling law can be explained if neural models are effectively just performing regression on a data manifold of intrinsic dimension ddd. This simple theory predicts that the scaling exponents α≈4/d\alpha \approx 4/dα≈4/d for cross-entropy and mean-squared error losses. We confirm the theory by independently measuring the intrinsic dimension and the scaling exponents in a teacher/student framework, where we can study a variety of ddd and α\alphaα by dialing the properties of random teacher networks. We also test the theory with CNN image classifiers on several datasets and with GPT-type language models.

View on arXiv
Comments on this paper