Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2102.01293
Cited By
Scaling Laws for Transfer
2 February 2021
Danny Hernandez
Jared Kaplan
T. Henighan
Sam McCandlish
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws for Transfer"
50 / 67 papers shown
Title
Learning Dynamics in Continual Pre-Training for Large Language Models
Xingjin Wang
Howe Tissue
Lu Wang
Linjing Li
D. Zeng
CLL
34
0
0
12 May 2025
A Scaling Law for Token Efficiency in LLM Fine-Tuning Under Fixed Compute Budgets
Ryan Lagasse
Aidan Kiernans
Avijit Ghosh
Shiri Dori-Hacohen
31
0
0
09 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
136
0
0
01 May 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
171
0
0
21 Apr 2025
Florenz: Scaling Laws for Systematic Generalization in Vision-Language Models
Julian Spravil
Sebastian Houben
Sven Behnke
VLM
75
0
0
12 Mar 2025
Predictable Artificial Intelligence
Lexin Zhou
Pablo Antonio Moreno Casares
Fernando Martínez-Plumed
John Burden
Ryan Burnell
...
Seán Ó hÉigeartaigh
Danaja Rutar
Wout Schellaert
Konstantinos Voudouris
José Hernández-Orallo
51
2
0
08 Jan 2025
Compute-Constrained Data Selection
Junjie Oscar Yin
Alexander M. Rush
39
0
0
21 Oct 2024
GraphCLIP: Enhancing Transferability in Graph Foundation Models for Text-Attributed Graphs
Yun Zhu
Haizhou Shi
Xiaotang Wang
Yongchao Liu
Yaoke Wang
Boci Peng
Chuntao Hong
Siliang Tang
VLM
60
7
0
14 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
53
9
0
11 Oct 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
88
9
0
27 Aug 2024
Reasoning in Large Language Models: A Geometric Perspective
Romain Cosentino
Sarath Shekkizhar
LRM
44
2
0
02 Jul 2024
Spectral regularization for adversarially-robust representation learning
Sheng Yang
Jacob A. Zavatone-Veth
Cengiz Pehlevan
AAML
OOD
49
0
0
27 May 2024
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
47
2
0
24 May 2024
Temporal Scaling Law for Large Language Models
Yizhe Xiong
Xiansheng Chen
Xin Ye
Hui Chen
Zijia Lin
...
Zhenpeng Su
Wei Huang
Jianwei Niu
J. Han
Guiguang Ding
43
9
0
27 Apr 2024
Understanding Optimal Feature Transfer via a Fine-Grained Bias-Variance Analysis
Yufan Li
Subhabrata Sen
Ben Adlam
MLT
51
1
0
18 Apr 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
57
64
0
25 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models
Adam Ibrahim
Benjamin Thérien
Kshitij Gupta
Mats L. Richter
Quentin Anthony
Timothée Lesort
Eugene Belilovsky
Irina Rish
KELM
CLL
44
54
0
13 Mar 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
246
69
0
31 Dec 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
7
0
21 Nov 2023
Uncovering Neural Scaling Laws in Molecular Representation Learning
Dingshuo Chen
Yanqiao Zhu
Jieyu Zhang
Yuanqi Du
Zhixun Li
Qiang Liu
Shu Wu
Liang Wang
32
16
0
15 Sep 2023
Position: Key Claims in LLM Research Have a Long Tail of Footnotes
Anna Rogers
A. Luccioni
53
19
0
14 Aug 2023
FedYolo: Augmenting Federated Learning with Pretrained Transformers
Xuechen Zhang
Mingchen Li
Xiangyu Chang
Jiasi Chen
A. Roy-Chowdhury
A. Suresh
Samet Oymak
FedML
31
7
0
10 Jul 2023
Improving Language Plasticity via Pretraining with Active Forgetting
Yihong Chen
Kelly Marchisio
Roberta Raileanu
David Ifeoluwa Adelani
Pontus Stenetorp
Sebastian Riedel
Mikel Artetx
KELM
AI4CE
CLL
30
24
0
03 Jul 2023
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
35
117
0
21 Apr 2023
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence
Gengchen Mai
Weiming Huang
Jin Sun
Suhang Song
Deepak Mishra
...
Yingjie Hu
Chris Cundy
Ziyuan Li
Rui Zhu
Ni Lao
AI4CE
32
123
0
13 Apr 2023
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
85
789
0
30 Mar 2023
Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text Diacritization
Bashar Al-Rfooh
Gheith A. Abandah
Rami Al-Rfou
26
4
0
25 Mar 2023
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Tyna Eloundou
Sam Manning
Pamela Mishkin
Daniel Rock
ELM
44
382
0
17 Mar 2023
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
38
29
0
19 Feb 2023
Cliff-Learning
T. T. Wang
I. Zablotchi
Nir Shavit
Jonathan S. Rosenfeld
44
0
0
14 Feb 2023
The unreasonable effectiveness of few-shot learning for machine translation
Xavier Garcia
Yamini Bansal
Colin Cherry
George F. Foster
M. Krikun
Fan Feng
Melvin Johnson
Orhan Firat
38
102
0
02 Feb 2023
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
38
51
0
30 Oct 2022
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
30
74
0
26 Oct 2022
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
44
111
0
26 Oct 2022
Same Pre-training Loss, Better Downstream: Implicit Bias Matters for Language Models
Hong Liu
Sang Michael Xie
Zhiyuan Li
Tengyu Ma
AI4CE
40
49
0
25 Oct 2022
Precision Machine Learning
Eric J. Michaud
Ziming Liu
Max Tegmark
24
34
0
24 Oct 2022
Scaling Laws for Reward Model Overoptimization
Leo Gao
John Schulman
Jacob Hilton
ALM
41
481
0
19 Oct 2022
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them
Mirac Suzgun
Nathan Scales
Nathanael Scharli
Sebastian Gehrmann
Yi Tay
...
Aakanksha Chowdhery
Quoc V. Le
Ed H. Chi
Denny Zhou
Jason W. Wei
ALM
ELM
LRM
ReLM
116
1,011
0
17 Oct 2022
Scaling Laws for a Multi-Agent Reinforcement Learning Model
Oren Neumann
C. Gros
32
26
0
29 Sep 2022
Local Grammar-Based Coding Revisited
L. Debowski
33
0
0
27 Sep 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
34
100
0
21 Jul 2022
Memorization Without Overfitting: Analyzing the Training Dynamics of Large Language Models
Kushal Tirumala
Aram H. Markosyan
Luke Zettlemoyer
Armen Aghajanyan
TDI
29
187
0
22 May 2022
Empirical Evaluation and Theoretical Analysis for Representation Learning: A Survey
Kento Nozawa
Issei Sato
AI4TS
24
4
0
18 Apr 2022
Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback
Yuntao Bai
Andy Jones
Kamal Ndousse
Amanda Askell
Anna Chen
...
Jack Clark
Sam McCandlish
C. Olah
Benjamin Mann
Jared Kaplan
77
2,341
0
12 Apr 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
69
1,846
0
29 Mar 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
19
17
0
13 Feb 2022
Unified Scaling Laws for Routed Language Models
Aidan Clark
Diego de Las Casas
Aurelia Guy
A. Mensch
Michela Paganini
...
Oriol Vinyals
Jack W. Rae
Erich Elsen
Koray Kavukcuoglu
Karen Simonyan
MoE
27
177
0
02 Feb 2022
The Effects of Reward Misspecification: Mapping and Mitigating Misaligned Models
Alexander Pan
Kush S. Bhatia
Jacob Steinhardt
53
171
0
10 Jan 2022
Can Multilinguality benefit Non-autoregressive Machine Translation?
Sweta Agrawal
Julia Kreutzer
Colin Cherry
AI4CE
29
1
0
16 Dec 2021
1
2
Next