Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.08361
Cited By
Scaling Laws for Neural Language Models
23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws for Neural Language Models"
50 / 359 papers shown
Title
Training-Free Multi-Step Audio Source Separation
Yongyi Zang
Jingyi Li
Qiuqiang Kong
222
0
0
26 May 2025
Large Language Models' Reasoning Stalls: An Investigation into the Capabilities of Frontier Models
Lachlan McGinness
Peter Baumgartner
ReLM
LRM
ELM
64
0
0
26 May 2025
Do Large Language Models (Really) Need Statistical Foundations?
Weijie Su
251
0
0
25 May 2025
An AI Capability Threshold for Rent-Funded Universal Basic Income in an AI-Automated Economy
Aran Nayebi
32
0
0
24 May 2025
Inference Compute-Optimal Video Vision Language Models
Peiqi Wang
ShengYun Peng
Xuewen Zhang
Hanchao Yu
Yibo Yang
Lifu Huang
Fujun Liu
Qifan Wang
VLM
80
0
0
24 May 2025
Data Mixing Can Induce Phase Transitions in Knowledge Acquisition
Xinran Gu
Kaifeng Lyu
Jiazheng Li
Jingzhao Zhang
62
0
0
23 May 2025
Scaling Image and Video Generation via Test-Time Evolutionary Search
Haoran He
Jiajun Liang
X. Wang
Pengfei Wan
Di Zhang
Kun Gai
Ling Pan
DiffM
210
0
0
23 May 2025
Stable Reinforcement Learning for Efficient Reasoning
Muzhi Dai
Shixuan Liu
Qingyi Si
OffRL
LRM
99
0
0
23 May 2025
A Coreset Selection of Coreset Selection Literature: Introduction and Recent Advances
Brian B. Moser
Arundhati S. Shanbhag
Stanislav Frolov
Federico Raue
Joachim Folz
Andreas Dengel
230
0
0
23 May 2025
The emergence of sparse attention: impact of data distribution and benefits of repetition
Nicolas Zucchet
Francesco dÁngelo
Andrew Kyle Lampinen
Stephanie C. Y. Chan
198
0
0
23 May 2025
L-MTP: Leap Multi-Token Prediction Beyond Adjacent Context for Large Language Models
Xiaohao Liu
Xiaobo Xia
Weixiang Zhao
Manyi Zhang
Xianzhi Yu
Xiu Su
Shuo Yang
See-Kiong Ng
Tat-Seng Chua
KELM
LRM
73
0
0
23 May 2025
Scaling Up Biomedical Vision-Language Models: Fine-Tuning, Instruction Tuning, and Multi-Modal Learning
Cheng Peng
Kai Zhang
Mengxian Lyu
Hongfang Liu
Lichao Sun
Yonghui Wu
LM&MA
MedIm
VLM
257
0
0
23 May 2025
Large Language Models Do Multi-Label Classification Differently
Marcus Ma
Georgios Chochlakis
Niyantha Maruthu Pandiyan
Jesse Thomason
Shrikanth Narayanan
77
1
0
23 May 2025
AdamS: Momentum Itself Can Be A Normalizer for LLM Pretraining and Post-training
Huishuai Zhang
Bohan Wang
Luoxin Chen
ODL
219
0
0
22 May 2025
FoMoH: A clinically meaningful foundation model evaluation for structured electronic health records
Chao Pang
Vincent Jeanselme
Young Sang Choi
Xinzhuo Jiang
Zilin Jing
...
Yuta Kobayashi
Yanwei Li
Florent Pollet
Karthik Natarajan
Shalmali Joshi
199
0
0
22 May 2025
Stronger ViTs With Octic Equivariance
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
ViT
206
0
0
21 May 2025
Self-GIVE: Associative Thinking from Limited Structured Knowledge for Enhanced Large Language Model Reasoning
Jiashu He
Jinxuan Fan
Bowen Jiang
Ignacio Houine
Dan Roth
Alejandro Ribeiro
ReLM
RALM
LRM
82
2
0
21 May 2025
Exploring Causes of Representational Similarity in Machine Learning Models
Zeyu Michael Li
Hung Anh Vu
Damilola Awofisayo
Emily Wenger
CML
221
0
0
20 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
170
0
0
20 May 2025
Quartet: Native FP4 Training Can Be Optimal for Large Language Models
Roberto L. Castro
Andrei Panferov
Soroush Tabesh
Oliver Sieberling
Jiale Chen
Mahdi Nikdan
Saleh Ashkboos
Dan Alistarh
MQ
65
0
0
20 May 2025
Panda: A pretrained forecast model for universal representation of chaotic dynamics
Jeffrey Lai
Anthony Bao
William Gilpin
AI4TS
AI4CE
82
0
0
19 May 2025
ZenFlow: Enabling Stall-Free Offloading Training via Asynchronous Updates
Tingfeng Lan
Yusen Wu
Bin Ma
Zhaoyuan Su
Rui Yang
Tekin Bicer
Dong Li
Yue Cheng
193
0
0
18 May 2025
Video-GPT via Next Clip Diffusion
Shaobin Zhuang
Zhipeng Huang
Ying Zhang
Fangyikang Wang
Canmiao Fu
Binxin Yang
Chong Sun
Chen Li
Yali Wang
DiffM
VGen
215
0
0
18 May 2025
JULI: Jailbreak Large Language Models by Self-Introspection
Jesson Wang
Zhanhao Hu
David Wagner
75
0
0
17 May 2025
Chain-of-Model Learning for Language Model
Kaitao Song
Xiaohua Wang
Xu Tan
Huiqiang Jiang
Chengruidong Zhang
...
Xiaoqing Zheng
Tao Qin
Yuqing Yang
Dongsheng Li
Lili Qiu
LRM
AI4CE
150
1
0
17 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
99
0
0
13 May 2025
Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta
Hyunmo Kang
Matthieu Wyart
98
1
0
11 May 2025
Can LLM-based Financial Investing Strategies Outperform the Market in Long Run?
Weixian Waylon Li
Hyeonjun Kim
Mihai Cucuringu
Tiejun Ma
AIFin
159
0
0
11 May 2025
Prompt Engineering: How Prompt Vocabulary affects Domain Knowledge
Dimitri Schreiter
61
1
0
10 May 2025
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
88
0
0
09 May 2025
Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model
Mingruo Yuan
Ben Kao
Tien-Hsuan Wu
Michael M. K. Cheung
Henry W. H. Chan
Anne S. Y. Cheung
Felix W. H. Chan
Yongxi Chen
AILaw
ELM
377
3
0
07 May 2025
Towards Large-scale Generative Ranking
Yanhua Huang
Yuxiao Chen
Xiong Cao
Rui Yang
Mingliang Qi
...
L. Chen
Weihang Chen
Min Zhu
Ruiwen Xu
Lei Zhang
100
0
0
07 May 2025
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Kyle Lampinen
Arslan Chaudhry
Stephanie Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Razvan Pascanu
Murray Shanahan
James L. McClelland
136
5
0
01 May 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
93
0
0
29 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
Siyang Song
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
153
0
0
28 Apr 2025
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
147
0
0
25 Apr 2025
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Li Sheng
Li Sheng
Xuekai Zhu
...
Youbang Sun
Zhiyuan Ma
Lifan Yuan
Ning Ding
Bowen Zhou
OffRL
331
24
0
22 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq Joty
ELM
ALM
LRM
123
4
0
21 Apr 2025
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
395
0
0
21 Apr 2025
Position: An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Patrik Reizinger
Randall Balestriero
David Klindt
Wieland Brendel
139
0
0
17 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
98
0
0
16 Apr 2025
CameraBench: Benchmarking Visual Reasoning in MLLMs via Photography
I-Sheng Fang
Jun-Cheng Chen
LRM
VLM
93
0
0
14 Apr 2025
Towards Combinatorial Interpretability of Neural Computation
Micah Adler
Dan Alistarh
Nir Shavit
FAtt
326
2
0
10 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
167
120
0
10 Apr 2025
Pretraining Language Models for Diachronic Linguistic Change Discovery
Elisabeth Fittschen
Sabrina Li
Tom Lippincott
Leshem Choshen
Craig Messner
152
0
0
07 Apr 2025
Pre-training Generative Recommender with Multi-Identifier Item Tokenization
Bowen Zheng
Enze Liu
Zhongfu Chen
Zhongrui Ma
Yue Wang
Wayne Xin Zhao
Ji-Rong Wen
119
0
0
06 Apr 2025
Universal Item Tokenization for Transferable Generative Recommendation
Bowen Zheng
Hongyu Lu
Yu Chen
Wayne Xin Zhao
Ji-Rong Wen
96
0
0
06 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
124
1
0
05 Apr 2025
MiLo: Efficient Quantized MoE Inference with Mixture of Low-Rank Compensators
Beichen Huang
Yueming Yuan
Zelei Shao
Minjia Zhang
MQ
MoE
90
0
0
03 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
227
0
0
03 Apr 2025
1
2
3
4
5
6
7
8
Next