ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.11840
  4. Cited By
A Hitchhiker's Guide to Scaling Law Estimation
v1v2 (latest)

A Hitchhiker's Guide to Scaling Law Estimation

15 October 2024
Leshem Choshen
Yang Zhang
Jacob Andreas
ArXiv (abs)PDFHTML

Papers citing "A Hitchhiker's Guide to Scaling Law Estimation"

48 / 48 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
190
120
0
10 Apr 2025
Scaling Laws for Downstream Task Performance in Machine Translation
Scaling Laws for Downstream Task Performance in Machine Translation
Berivan Isik
Natalia Ponomareva
Hussein Hazimeh
Dimitris Paparas
Sergei Vassilvitskii
Sanmi Koyejo
160
23
0
24 Feb 2025
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
How to Upscale Neural Networks with Scaling Law? A Survey and Practical Guidelines
Ayan Sengupta
Ayan Sengupta
Tanmoy Chakraborty
145
0
0
17 Feb 2025
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Sloth: scaling laws for LLM skills to predict multi-benchmark performance across families
Felipe Maia Polo
Shivalika Singh
Leshem Choshen
Yuekai Sun
Mikhail Yurochkin
206
8
0
09 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
169
14
0
06 Dec 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
107
10
0
24 Jul 2024
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Scaling Laws with Vocabulary: Larger Models Deserve Larger Vocabularies
Chaofan Tao
Qian Liu
Longxu Dou
Niklas Muennighoff
Zhongwei Wan
Ping Luo
Min Lin
Ngai Wong
PILM
110
54
0
18 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
162
26
0
27 Jun 2024
Time Matters: Scaling Laws for Any Budget
Time Matters: Scaling Laws for Any Budget
Itay Inbar
Luke Sernau
45
2
0
27 Jun 2024
gzip Predicts Data-dependent Scaling Laws
gzip Predicts Data-dependent Scaling Laws
Rohan Pandey
79
11
0
26 May 2024
Super Tiny Language Models
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
100
6
0
23 May 2024
Temporal Scaling Law for Large Language Models
Temporal Scaling Law for Large Language Models
Yizhe Xiong
Xiansheng Chen
Xin Ye
Hui Chen
Zijia Lin
...
Zhenpeng Su
Wei Huang
Jianwei Niu
Jiawei Han
Guiguang Ding
111
10
0
27 Apr 2024
Chinchilla Scaling: A replication attempt
Chinchilla Scaling: A replication attempt
T. Besiroglu
Ege Erdil
Matthew Barnett
Josh You
94
24
0
15 Apr 2024
MiniCPM: Unveiling the Potential of Small Language Models with Scalable
  Training Strategies
MiniCPM: Unveiling the Potential of Small Language Models with Scalable Training Strategies
Shengding Hu
Yuge Tu
Xu Han
Chaoqun He
Ganqu Cui
...
Chaochao Jia
Guoyang Zeng
Dahai Li
Zhiyuan Liu
Maosong Sun
MoE
107
345
0
09 Apr 2024
Emergent Abilities in Reduced-Scale Generative Language Models
Emergent Abilities in Reduced-Scale Generative Language Models
Sherin Muckatira
Vijeta Deshpande
Vladislav Lialin
Anna Rumshisky
ReLMELMLRM
51
5
0
02 Apr 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALMELMLRM
168
48
0
13 Mar 2024
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
  Finetuning Method
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Biao Zhang
Zhongtao Liu
Colin Cherry
Orhan Firat
LRM
108
157
0
27 Feb 2024
Scaling Laws for Fine-Grained Mixture of Experts
Scaling Laws for Fine-Grained Mixture of Experts
Jakub Krajewski
Jan Ludziejewski
Kamil Adamczewski
Maciej Pióro
Michal Krutul
...
Krystian Król
Tomasz Odrzygó'zd'z
Piotr Sankowski
Marek Cygan
Sebastian Jaszczur
MoE
99
60
0
12 Feb 2024
Repeat After Me: Transformers are Better than State Space Models at
  Copying
Repeat After Me: Transformers are Better than State Space Models at Copying
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
163
95
0
01 Feb 2024
In-Context Language Learning: Architectures and Algorithms
In-Context Language Learning: Architectures and Algorithms
Ekin Akyürek
Bailin Wang
Yoon Kim
Jacob Andreas
LRMReLM
115
56
0
23 Jan 2024
How predictable is language model benchmark performance?
How predictable is language model benchmark performance?
David Owen
ELMLRM
85
22
0
09 Jan 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
358
88
0
31 Dec 2023
LLM360: Towards Fully Transparent Open-Source LLMs
LLM360: Towards Fully Transparent Open-Source LLMs
Zhengzhong Liu
Aurick Qiao
Willie Neiswanger
Hongyi Wang
Bowen Tan
...
Zhiting Hu
Mark Schulze
Preslav Nakov
Timothy Baldwin
Eric Xing
119
78
0
11 Dec 2023
Small-scale proxies for large-scale Transformer training instabilities
Small-scale proxies for large-scale Transformer training instabilities
Mitchell Wortsman
Peter J. Liu
Lechao Xiao
Katie Everett
A. Alemi
...
Jascha Narain Sohl-Dickstein
Kelvin Xu
Jaehoon Lee
Justin Gilmer
Simon Kornblith
103
99
0
25 Sep 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
85
74
0
13 Sep 2023
Efficient Benchmarking of Language Models
Efficient Benchmarking of Language Models
Yotam Perlitz
Elron Bandel
Ariel Gera
Ofir Arviv
L. Ein-Dor
Eyal Shnarch
Noam Slonim
Michal Shmueli-Scheuer
Leshem Choshen
ALM
75
28
0
22 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MHALM
413
12,076
0
18 Jul 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
126
226
0
25 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
121
64
0
22 May 2023
Pythia: A Suite for Analyzing Large Language Models Across Training and
  Scaling
Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling
Stella Biderman
Hailey Schoelkopf
Quentin G. Anthony
Herbie Bradley
Kyle O'Brien
...
USVSN Sai Prashanth
Edward Raff
Aviya Skowron
Lintang Sutawika
Oskar van der Wal
110
1,307
0
03 Apr 2023
Training Trajectories of Language Models Across Scales
Training Trajectories of Language Models Across Scales
Mengzhou Xia
Mikel Artetxe
Chunting Zhou
Xi Lin
Ramakanth Pasunuru
Danqi Chen
Luke Zettlemoyer
Ves Stoyanov
AIFinLRM
89
64
0
19 Dec 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
417
2,393
0
09 Nov 2022
Broken Neural Scaling Laws
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
82
76
0
26 Oct 2022
Revisiting Neural Scaling Laws in Language and Vision
Revisiting Neural Scaling Laws in Language and Vision
Ibrahim Alabdulmohsin
Behnam Neyshabur
Xiaohua Zhai
222
111
0
13 Sep 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence
  Scaling?
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
111
106
0
21 Jul 2022
Beyond neural scaling laws: beating power law scaling via data pruning
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
100
444
0
29 Jun 2022
RITA: a Study on Scaling Up Generative Protein Sequence Models
RITA: a Study on Scaling Up Generative Protein Sequence Models
Daniel Hesslow
Niccoló Zanichelli
Pascal Notin
Iacopo Poli
D. Marks
56
99
0
11 May 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,987
0
29 Mar 2022
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot
  Hyperparameter Transfer
Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer
Greg Yang
J. E. Hu
Igor Babuschkin
Szymon Sidor
Xiaodong Liu
David Farhi
Nick Ryder
J. Pachocki
Weizhu Chen
Jianfeng Gao
114
168
0
07 Mar 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance
  from Small Scale Experiments
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
72
17
0
13 Feb 2022
Exploring the Limits of Large Scale Pre-training
Exploring the Limits of Large Scale Pre-training
Samira Abnar
Mostafa Dehghani
Behnam Neyshabur
Hanie Sedghi
AI4CE
103
119
0
05 Oct 2021
The Grammar-Learning Trajectories of Neural Language Models
The Grammar-Learning Trajectories of Neural Language Models
Leshem Choshen
Guy Hacohen
D. Weinshall
Omri Abend
97
29
0
13 Sep 2021
A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your
  Pre-training Effective?
A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?
Hiroaki Mikami
Kenji Fukumizu
Shogo Murai
Shuji Suzuki
Yuta Kikuchi
Taiji Suzuki
S. Maeda
Kohei Hayashi
78
12
0
25 Aug 2021
The MultiBERTs: BERT Reproductions for Robustness Analysis
The MultiBERTs: BERT Reproductions for Robustness Analysis
Thibault Sellam
Steve Yadlowsky
Jason W. Wei
Naomi Saphra
Alexander DÁmour
...
Iulia Turc
Jacob Eisenstein
Dipanjan Das
Ian Tenney
Ellie Pavlick
104
95
0
30 Jun 2021
Scaling Laws for Transfer
Scaling Laws for Transfer
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
95
251
0
02 Feb 2021
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
904
42,520
0
28 May 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
651
4,925
0
23 Jan 2020
A Constructive Prediction of the Generalization Error Across Scales
A Constructive Prediction of the Generalization Error Across Scales
Jonathan S. Rosenfeld
Amir Rosenfeld
Yonatan Belinkov
Nir Shavit
105
215
0
27 Sep 2019
1