Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks
- BDL

Scaling has been a major driver of recent advancements in deep learning. Numerous empirical studies have found that scaling laws often follow the power-law and proposed several variants of power-law functions to predict the scaling behavior at larger scales. However, existing methods mostly rely on point estimation and do not quantify uncertainty, which is crucial for real-world applications involving decision-making problems such as determining the expected performance improvements achievable by investing additional computational resources. In this work, we explore a Bayesian framework based on Prior-data Fitted Networks (PFNs) for neural scaling law extrapolation. Specifically, we design a prior distribution that enables the sampling of infinitely many synthetic functions resembling real-world neural scaling laws, allowing our PFN to meta-learn the extrapolation. We validate the effectiveness of our approach on real-world neural scaling laws, comparing it against both the existing point estimation methods and Bayesian approaches. Our method demonstrates superior performance, particularly in data-limited scenarios such as Bayesian active learning, underscoring its potential for reliable, uncertainty-aware extrapolation in practical applications.
View on arXiv@article{lee2025_2505.23032, title={ Bayesian Neural Scaling Law Extrapolation with Prior-Data Fitted Networks }, author={ Dongwoo Lee and Dong Bok Lee and Steven Adriaensen and Juho Lee and Sung Ju Hwang and Frank Hutter and Seon Joo Kim and Hae Beom Lee }, journal={arXiv preprint arXiv:2505.23032}, year={ 2025 } }