57
0
v1v2v3 (latest)

Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks

Main:8 Pages
23 Figures
Bibliography:6 Pages
14 Tables
Appendix:19 Pages
Abstract

Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.

View on arXiv
@article{lee2025_2505.23032,
  title={ Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks },
  author={ Dongwoo Lee and Dong Bok Lee and Steven Adriaensen and Juho Lee and Sung Ju Hwang and Frank Hutter and Seon Joo Kim and Hae Beom Lee },
  journal={arXiv preprint arXiv:2505.23032},
  year={ 2025 }
}
Comments on this paper