Loss-to-Loss Prediction: Scaling Laws for All Datasets

19 November 2024

Papers citing "Loss-to-Loss Prediction: Scaling Laws for All Datasets"

21 / 21 papers shown

Title
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws Yiding Jiang Allan Zhou Zhili Feng Sadhika Malladi J. Zico Kolter 79 22 0 15 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 104 17 0 26 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models Rosie Zhao Depen Morwani David Brandfonbrener Nikhil Vyas Sham Kakade 103 25 0 10 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models Tomer Porian Mitchell Wortsman J. Jitsev Ludwig Schmidt Y. Carmon 123 26 0 27 Jun 2024
Scaling and renormalization in high-dimensional regression Alexander B. Atanasov Jacob A. Zavatone-Veth Cengiz Pehlevan 60 20 0 01 May 2024
Chinchilla Scaling: A replication attempt T. Besiroglu Ege Erdil Matthew Barnett Josh You 78 24 0 15 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance Jiasheng Ye Peiju Liu Tianxiang Sun Yunhua Zhou Jun Zhan Xipeng Qiu 110 76 0 25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective Zhengxiao Du Aohan Zeng Yuxiao Dong Jie Tang UQCV LRM 117 54 0 23 Mar 2024
A Dynamical Model of Neural Scaling Laws Blake Bordelon Alexander B. Atanasov Cengiz Pehlevan 96 44 0 02 Feb 2024
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 87 99 0 25 Sep 2023
Training Compute-Optimal Large Language Models Jordan Hoffmann Sebastian Borgeaud A. Mensch Elena Buchatskaya Trevor Cai ... Karen Simonyan Erich Elsen Jack W. Rae Oriol Vinyals Laurent Sifre AI4TS 208 1,980 0 29 Mar 2022
Datamodels: Predicting Predictions from Training Data Andrew Ilyas Sung Min Park Logan Engstrom Guillaume Leclerc Aleksander Madry TDI 131 141 0 01 Feb 2022
Covariate Shift in High-Dimensional Random Feature Regression Nilesh Tripuraneni Ben Adlam Jeffrey Pennington OOD 45 24 0 16 Nov 2021
Exploring the Limits of Large Scale Pre-training Samira Abnar Mostafa Dehghani Behnam Neyshabur Hanie Sedghi AI4CE 97 119 0 05 Oct 2021
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization John Miller Rohan Taori Aditi Raghunathan Shiori Sagawa Pang Wei Koh Vaishaal Shankar Percy Liang Y. Carmon Ludwig Schmidt OODD OOD 91 278 0 09 Jul 2021
Explaining Neural Scaling Laws Yasaman Bahri Ethan Dyer Jared Kaplan Jaehoon Lee Utkarsh Sharma 75 269 0 12 Feb 2021
Learning Curve Theory Marcus Hutter 216 64 0 08 Feb 2021
Scaling Laws for Transfer Danny Hernandez Jared Kaplan T. Henighan Sam McCandlish 90 250 0 02 Feb 2021
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks Blake Bordelon Abdulkadir Canatar Cengiz Pehlevan 235 208 0 07 Feb 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 611 4,905 0 23 Jan 2020
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm S. Spigler Mario Geiger Matthieu Wyart 68 38 0 26 May 2019