Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2411.12925
Cited By
Loss-to-Loss Prediction: Scaling Laws for All Datasets
19 November 2024
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Loss-to-Loss Prediction: Scaling Laws for All Datasets"
21 / 21 papers shown
Title
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
79
22
0
15 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
104
17
0
26 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
103
25
0
10 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
123
26
0
27 Jun 2024
Scaling and renormalization in high-dimensional regression
Alexander B. Atanasov
Jacob A. Zavatone-Veth
Cengiz Pehlevan
60
20
0
01 May 2024
Chinchilla Scaling: A replication attempt
T. Besiroglu
Ege Erdil
Matthew Barnett
Josh You
78
24
0
15 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
110
76
0
25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCV
LRM
117
54
0
23 Mar 2024
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
96
44
0
02 Feb 2024
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
87
99
0
25 Sep 2023
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,980
0
29 Mar 2022
Datamodels: Predicting Predictions from Training Data
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
Aleksander Madry
TDI
131
141
0
01 Feb 2022
Covariate Shift in High-Dimensional Random Feature Regression
Nilesh Tripuraneni
Ben Adlam
Jeffrey Pennington
OOD
45
24
0
16 Nov 2021
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
97
119
0
05 Oct 2021
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization
John Miller
Rohan Taori
Aditi Raghunathan
Shiori Sagawa
Pang Wei Koh
Vaishaal Shankar
Percy Liang
Y. Carmon
Ludwig Schmidt
OODD
OOD
91
278
0
09 Jul 2021
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
75
269
0
12 Feb 2021
Learning Curve Theory
Marcus Hutter
216
64
0
08 Feb 2021
Scaling Laws for Transfer
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
90
250
0
02 Feb 2021
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Blake Bordelon
Abdulkadir Canatar
Cengiz Pehlevan
235
208
0
07 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
611
4,905
0
23 Jan 2020
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
S. Spigler
Mario Geiger
Matthieu Wyart
68
38
0
26 May 2019
1