Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.16264
Cited By
Scaling Data-Constrained Language Models
25 May 2023
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Data-Constrained Language Models"
50 / 58 papers shown
Title
Generalization Bounds and Stopping Rules for Learning with Self-Selected Data
Julian Rodemann
James Bailie
SSL
53
1
0
12 May 2025
Incentivizing Inclusive Contributions in Model Sharing Markets
Enpei Zhang
Jingyi Chai
Rui Ye
Yanfeng Wang
Siheng Chen
TDI
FedML
144
0
0
05 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
57
0
0
29 Apr 2025
Privacy-Preserving Federated Embedding Learning for Localized Retrieval-Augmented Generation
Qianren Mao
Qili Zhang
Hanwen Hao
Zhentao Han
Runhua Xu
...
Jing Chen
Yangqiu Song
Jin Dong
Jianxin Li
Philip S. Yu
71
1
0
27 Apr 2025
MAGIC: Near-Optimal Data Attribution for Deep Learning
Andrew Ilyas
Logan Engstrom
TDI
39
0
0
23 Apr 2025
ToReMi: Topic-Aware Data Reweighting for Dynamic Pre-Training Data Selection
Xiaoxuan Zhu
Zhouhong Gu
Baiqian Wu
Suhang Zheng
Tao Wang
Tianyu Li
Hongwei Feng
Yanghua Xiao
42
0
0
01 Apr 2025
SimPER: A Minimalist Approach to Preference Alignment without Hyperparameters
Teng Xiao
Yige Yuan
Z. Chen
Mingxiao Li
Shangsong Liang
Z. Ren
V. Honavar
95
5
0
21 Feb 2025
AURORA:Automated Training Framework of Universal Process Reward Models via Ensemble Prompting and Reverse Verification
Xiaoyu Tan
Tianchu Yao
C. Qu
Bin Li
Minghao Yang
...
Haozhe Wang
Xihe Qiu
Wei Chu
Yinghui Xu
Yuan Qi
OffRL
LRM
49
2
0
17 Feb 2025
Position: Multimodal Large Language Models Can Significantly Advance Scientific Reasoning
Yibo Yan
Shen Wang
Jiahao Huo
Jingheng Ye
Zhendong Chu
Xuming Hu
Philip S. Yu
Carla P. Gomes
B. Selman
Qingsong Wen
LRM
127
9
0
05 Feb 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
108
141
0
22 Jan 2025
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
77
8
0
29 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
67
5
0
22 Oct 2024
Compute-Constrained Data Selection
Junjie Oscar Yin
Alexander M. Rush
39
0
0
21 Oct 2024
Scaling laws for post-training quantized large language models
Zifei Xu
Alexander Lan
W. Yazar
T. Webb
Sayeh Sharify
Xin Wang
MQ
28
0
0
15 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
50
8
0
11 Oct 2024
MACPO: Weak-to-Strong Alignment via Multi-Agent Contrastive Preference Optimization
Yougang Lyu
Lingyong Yan
Zihan Wang
Dawei Yin
Pengjie Ren
Maarten de Rijke
Z. Z. Ren
60
6
0
10 Oct 2024
Detecting Training Data of Large Language Models via Expectation Maximization
Gyuwan Kim
Yang Li
Evangelia Spiliopoulou
Jie Ma
Miguel Ballesteros
William Yang Wang
MIALM
95
4
2
10 Oct 2024
Data Selection via Optimal Control for Language Models
Yuxian Gu
Li Dong
Hongning Wang
Y. Hao
Qingxiu Dong
Furu Wei
Minlie Huang
AI4CE
55
4
0
09 Oct 2024
The Optimization Landscape of SGD Across the Feature Learning Strength
Alexander B. Atanasov
Alexandru Meterez
James B. Simon
C. Pehlevan
43
2
0
06 Oct 2024
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis
Ulyana Piterbarg
Lerrel Pinto
Rob Fergus
SyDa
37
2
0
03 Oct 2024
Scaling Optimal LR Across Token Horizons
Johan Bjorck
Alon Benhaim
Vishrav Chaudhary
Furu Wei
Xia Song
54
4
0
30 Sep 2024
Task-Adaptive Pretrained Language Models via Clustered-Importance Sampling
David Grangier
Simin Fan
Skyler Seto
Pierre Ablin
44
3
0
30 Sep 2024
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
C. Pehlevan
57
12
0
26 Sep 2024
Model Agnostic Hybrid Sharding For Heterogeneous Distributed Inference
Claudio Angione
Yue Zhao
Harry Yang
Ahmad Farhan
Fielding Johnston
James Buban
Patrick Colangelo
42
1
0
29 Jul 2024
Beyond Next Token Prediction: Patch-Level Training for Large Language Models
Chenze Shao
Fandong Meng
Jie Zhou
46
1
0
17 Jul 2024
DeepGate3: Towards Scalable Circuit Representation Learning
Zhengyuan Shi
Ziyang Zheng
Sadaf Khan
Qiang Xu
Min Li
Qiang Xu
GNN
AI4CE
41
8
0
15 Jul 2024
Leveraging Large Language Models for Integrated Satellite-Aerial-Terrestrial Networks: Recent Advances and Future Directions
Shumaila Javaid
R. A. Khalil
Nasir Saeed
Bin He
Mohamed-Slim Alouini
39
9
0
05 Jul 2024
RegMix: Data Mixture as Regression for Language Model Pre-training
Qian Liu
Xiaosen Zheng
Niklas Muennighoff
Guangtao Zeng
Longxu Dou
Tianyu Pang
Jing Jiang
Min-Bin Lin
MoE
74
40
1
01 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
60
20
0
27 Jun 2024
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Licong Lin
Jingfeng Wu
Sham Kakade
Peter L. Bartlett
Jason D. Lee
LRM
41
15
0
12 Jun 2024
Scaling Large Language Model-based Multi-Agent Collaboration
Chen Qian
Zihao Xie
YiFei Wang
Wei Liu
Yufan Dang
...
Zhuoyun Du
Weize Chen
Cheng Yang
Zhiyuan Liu
Maosong Sun
AI4CE
LLMAG
LM&Ro
61
45
0
11 Jun 2024
Perplexed by Perplexity: Perplexity-Based Data Pruning With Small Reference Models
Zachary Ankner
Cody Blakeney
Kartik K. Sreenivasan
Max Marion
Matthew L. Leavitt
Mansheej Paul
43
24
0
30 May 2024
Scaling Laws for Discriminative Classification in Large Language Models
Dean Wyatte
Fatemeh Tahmasbi
Ming Li
Thomas Markovich
41
2
0
24 May 2024
Asymptotic theory of in-context learning by linear attention
Yue M. Lu
Mary I. Letey
Jacob A. Zavatone-Veth
Anindita Maiti
C. Pehlevan
29
10
0
20 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory
Xueyan Niu
Bo Bai
Lei Deng
Wei Han
36
6
0
14 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
67
0
0
13 May 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
49
62
0
25 Mar 2024
Balanced Data Sampling for Language Model Training with Clustering
Yunfan Shao
Linyang Li
Zhaoye Fei
Hang Yan
Dahua Lin
Xipeng Qiu
37
8
0
22 Feb 2024
CroissantLLM: A Truly Bilingual French-English Language Model
Manuel Faysse
Patrick Fernandes
Nuno M. Guerreiro
António Loison
Duarte M. Alves
...
François Yvon
André F.T. Martins
Gautier Viaud
C´eline Hudelot
Pierre Colombo
52
32
0
01 Feb 2024
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
40
9
0
10 Oct 2023
Diffusion Language Models Can Perform Many Tasks with Scaling and Instruction-Finetuning
Jiasheng Ye
Zaixiang Zheng
Yu Bao
Lihua Qian
Quanquan Gu
DiffM
54
14
0
23 Aug 2023
What Language Model to Train if You Have One Million GPU Hours?
Teven Le Scao
Thomas Wang
Daniel Hesslow
Lucile Saulnier
Stas Bekman
...
Lintang Sutawika
Jaesung Tae
Zheng-Xin Yong
Julien Launay
Iz Beltagy
MoE
AI4CE
230
103
0
27 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
250
1,073
0
05 Oct 2022
Revisiting Neural Scaling Laws in Language and Vision
Ibrahim M. Alabdulmohsin
Behnam Neyshabur
Xiaohua Zhai
159
102
0
13 Sep 2022
Scaling Laws vs Model Architectures: How does Inductive Bias Influence Scaling?
Yi Tay
Mostafa Dehghani
Samira Abnar
Hyung Won Chung
W. Fedus
J. Rao
Sharan Narang
Vinh Q. Tran
Dani Yogatama
Donald Metzler
AI4CE
34
100
0
21 Jul 2022
Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset
Peter Henderson
M. Krass
Lucia Zheng
Neel Guha
Christopher D. Manning
Dan Jurafsky
Daniel E. Ho
AILaw
ELM
131
97
0
01 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
328
4,077
0
24 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
313
11,953
0
04 Mar 2022
PromptSource: An Integrated Development Environment and Repository for Natural Language Prompts
Stephen H. Bach
Victor Sanh
Zheng-Xin Yong
Albert Webson
Colin Raffel
...
Khalid Almubarak
Xiangru Tang
Dragomir R. Radev
Mike Tian-Jian Jiang
Alexander M. Rush
VLM
225
338
0
02 Feb 2022
1
2
Next