ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2411.12925
  4. Cited By
Loss-to-Loss Prediction: Scaling Laws for All Datasets

Loss-to-Loss Prediction: Scaling Laws for All Datasets

19 November 2024
David Brandfonbrener
Nikhil Anand
Nikhil Vyas
Eran Malach
Sham Kakade
ArXiv (abs)PDFHTML

Papers citing "Loss-to-Loss Prediction: Scaling Laws for All Datasets"

21 / 21 papers shown
Title
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Adaptive Data Optimization: Dynamic Sample Selection with Scaling Laws
Yiding Jiang
Allan Zhou
Zhili Feng
Sadhika Malladi
J. Zico Kolter
79
22
0
15 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
104
17
0
26 Sep 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
103
25
0
10 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
123
26
0
27 Jun 2024
Scaling and renormalization in high-dimensional regression
Scaling and renormalization in high-dimensional regression
Alexander B. Atanasov
Jacob A. Zavatone-Veth
Cengiz Pehlevan
60
20
0
01 May 2024
Chinchilla Scaling: A replication attempt
Chinchilla Scaling: A replication attempt
T. Besiroglu
Ege Erdil
Matthew Barnett
Josh You
78
24
0
15 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
110
76
0
25 Mar 2024
Understanding Emergent Abilities of Language Models from the Loss Perspective
Understanding Emergent Abilities of Language Models from the Loss Perspective
Zhengxiao Du
Aohan Zeng
Yuxiao Dong
Jie Tang
UQCVLRM
117
54
0
23 Mar 2024
A Dynamical Model of Neural Scaling Laws
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
96
44
0
02 Feb 2024
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
87
99
0
25 Sep 2023
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,980
0
29 Mar 2022
Datamodels: Predicting Predictions from Training Data
Datamodels: Predicting Predictions from Training Data
Andrew Ilyas
Sung Min Park
Logan Engstrom
Guillaume Leclerc
Aleksander Madry
TDI
131
141
0
01 Feb 2022
Covariate Shift in High-Dimensional Random Feature Regression
Covariate Shift in High-Dimensional Random Feature Regression
Nilesh Tripuraneni
Ben Adlam
Jeffrey Pennington
OOD
45
24
0
16 Nov 2021
Exploring the Limits of Large Scale Pre-training
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
97
119
0
05 Oct 2021
Accuracy on the Line: On the Strong Correlation Between
  Out-of-Distribution and In-Distribution Generalization
Accuracy on the Line: On the Strong Correlation Between Out-of-Distribution and In-Distribution Generalization
John Miller
Rohan Taori
Aditi Raghunathan
Shiori Sagawa
Pang Wei Koh
Vaishaal Shankar
Percy Liang
Y. Carmon
Ludwig Schmidt
OODDOOD
91
278
0
09 Jul 2021
Explaining Neural Scaling Laws
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
75
269
0
12 Feb 2021
Learning Curve Theory
Learning Curve Theory
Marcus Hutter
216
64
0
08 Feb 2021
Scaling Laws for Transfer
Scaling Laws for Transfer
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
90
250
0
02 Feb 2021
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural
  Networks
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Blake Bordelon
Abdulkadir Canatar
Cengiz Pehlevan
235
208
0
07 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
611
4,905
0
23 Jan 2020
Asymptotic learning curves of kernel methods: empirical data v.s.
  Teacher-Student paradigm
Asymptotic learning curves of kernel methods: empirical data v.s. Teacher-Student paradigm
S. Spigler
Mario Geiger
Matthieu Wyart
68
38
0
26 May 2019
1