Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.08361
Cited By
Scaling Laws for Neural Language Models
23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws for Neural Language Models"
50 / 909 papers shown
Title
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis
Jiarun Liu
Hong-Yu Zhou
Weijian Huang
Hao Yang
Dongning Song
Tao Tan
Yong Liang
Shanshan Wang
MedIm
23
0
0
14 May 2025
Resource-Efficient Language Models: Quantization for Fast and Accessible Inference
Tollef Emil Jørgensen
MQ
49
0
0
13 May 2025
Small but Significant: On the Promise of Small Language Models for Accessible AIED
Yumou Wei
Paulo Carvalho
John Stamper
SyDa
40
0
0
13 May 2025
ForeCite: Adapting Pre-Trained Language Models to Predict Future Citation Rates of Academic Papers
Gavin Hull
Alex Bihlo
24
0
0
13 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
28
0
0
13 May 2025
Learning Dynamics in Continual Pre-Training for Large Language Models
Xingjin Wang
Howe Tissue
Lu Wang
Linjing Li
D. Zeng
CLL
29
0
0
12 May 2025
SpecRouter: Adaptive Routing for Multi-Level Speculative Decoding in Large Language Models
Hang Wu
Jianian Zhu
Y. Li
Haojie Wang
Biao Hou
Jidong Zhai
38
0
0
12 May 2025
Large Language Models and Arabic Content: A Review
Haneh Rhel
Dmitri Roussinov
29
0
0
12 May 2025
UMoE: Unifying Attention and FFN with Shared Experts
Yuanhang Yang
Chaozheng Wang
Jing Li
MoE
29
0
0
12 May 2025
Guiding Data Collection via Factored Scaling Curves
Lihan Zha
Apurva Badithela
Michael Zhang
Justin Lidard
Jeremy Bao
Emily Zhou
David Snyder
Allen Z. Ren
Dhruv Shah
Anirudha Majumdar
OffRL
34
0
0
12 May 2025
Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta
Hyunmo Kang
M. Wyart
36
0
0
11 May 2025
The power of fine-grained experts: Granularity boosts expressivity in Mixture of Experts
Enric Boix Adserà
Philippe Rigollet
MoE
18
0
0
11 May 2025
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
Francesco Cagnetta
Alessandro Favero
Antonio Sclocchi
M. Wyart
26
0
0
11 May 2025
Unraveling Quantum Environments: Transformer-Assisted Learning in Lindblad Dynamics
Chi-Sheng Chen
En-Jui Kuo
AI4CE
24
0
0
11 May 2025
ProFashion: Prototype-guided Fashion Video Generation with Multiple Reference Images
Xianghao Kong
Qiaosong Qi
Yuanbin Wang
Anyi Rao
Biaolong Chen
Aixi Zhang
Si Liu
Hao Jiang
DiffM
VGen
25
0
0
10 May 2025
Text-to-CadQuery: A New Paradigm for CAD Generation with Scalable Large Model Capabilities
Haoyang Xie
Feng Ju
21
0
0
10 May 2025
Insertion Language Models: Sequence Generation with Arbitrary-Position Insertions
Dhruvesh Patel
Aishwarya Sahoo
Avinash Amballa
Tahira Naseem
Tim G. J. Rudner
Andrew McCallum
KELM
47
0
0
09 May 2025
Accelerating Diffusion Transformer via Increment-Calibrated Caching with Channel-Aware Singular Value Decomposition
Zhiyuan Chen
Keyi Li
Yifan Jia
Le Ye
Yufei Ma
DiffM
28
0
0
09 May 2025
Engineering Risk-Aware, Security-by-Design Frameworks for Assurance of Large-Scale Autonomous AI Models
Krti Tallam
31
0
0
09 May 2025
Crowding Out The Noise: Algorithmic Collective Action Under Differential Privacy
Rushabh Solanki
Meghana Bhange
Ulrich Aïvodji
Elliot Creager
29
0
0
09 May 2025
Scalable LLM Math Reasoning Acceleration with Low-rank Distillation
Harry Dong
Bilge Acun
Beidi Chen
Yuejie Chi
LRM
29
0
0
08 May 2025
Scaling Laws for Speculative Decoding
Siyuan Yan
Mo Zhu
Guo-qing Jiang
Jianfei Wang
Jiaxing Chen
...
Xiang Liao
Xiao Cui
Chen Zhang
Zhuoran Song
Ran Zhu
LRM
43
0
0
08 May 2025
clem:todd: A Framework for the Systematic Benchmarking of LLM-Based Task-Oriented Dialogue System Realisations
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
LLMAG
42
0
0
08 May 2025
Towards Large-scale Generative Ranking
Yanhua Huang
Y. Chen
Xiong Cao
Rui Yang
Mingliang Qi
...
L. Chen
Weihang Chen
Min Zhu
Ruiwen Xu
Lei Zhang
45
0
0
07 May 2025
Bringing legal knowledge to the public by constructing a legal question bank using large-scale pre-trained language model
Mingruo Yuan
Ben Kao
Tien-Hsuan Wu
Michael M. K. Cheung
Henry W. H. Chan
Anne S. Y. Cheung
Felix W. H. Chan
Yongxi Chen
AILaw
ELM
121
3
0
07 May 2025
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
Qingyuan Wang
Guoxin Wang
B. Cardiff
Deepu John
38
0
0
07 May 2025
MergeGuard: Efficient Thwarting of Trojan Attacks in Machine Learning Models
Soheil Zibakhsh Shabgahi
Yaman Jandali
F. Koushanfar
MoMe
AAML
54
0
0
06 May 2025
The Steganographic Potentials of Language Models
Artem Karpov
Tinuade Adeleke
Seong Hah Cho
Natalia Perez-Campanero
32
0
0
06 May 2025
Quiet Feature Learning in Algorithmic Tasks
Prudhviraj Naidu
Zixian Wang
Leon Bergen
R. Paturi
VLM
54
0
0
06 May 2025
Faster MoE LLM Inference for Extremely Large Models
Haoqi Yang
Luohe Shi
Qiwei Li
Zuchao Li
Ping Wang
Bo Du
Mengjia Shen
Hai Zhao
MoE
63
0
0
06 May 2025
Improving Model Alignment Through Collective Intelligence of Open-Source LLMS
Junlin Wang
Roy Xie
Shang Zhu
Jue Wang
Ben Athiwaratkun
Bhuwan Dhingra
S. Song
Ce Zhang
James Y. Zou
ALM
31
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
Incentivizing Inclusive Contributions in Model Sharing Markets
Enpei Zhang
Jingyi Chai
Rui Ye
Yanfeng Wang
Siheng Chen
TDI
FedML
136
0
0
05 May 2025
RM-R1: Reward Modeling as Reasoning
X. Chen
Gaotang Li
Z. Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
150
0
0
05 May 2025
Invoke Interfaces Only When Needed: Adaptive Invocation for Large Language Models in Question Answering
Jihao Zhao
Chunlai Zhou
Biao Qin
52
0
0
05 May 2025
High-Fidelity Pseudo-label Generation by Large Language Models for Training Robust Radiology Report Classifiers
Brian Wong
Kaito Tanaka
32
0
0
03 May 2025
Cer-Eval: Certifiable and Cost-Efficient Evaluation Framework for LLMs
G. Wang
Z. Chen
Bo Li
Haifeng Xu
112
0
0
02 May 2025
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
113
0
0
02 May 2025
Position: Enough of Scaling LLMs! Lets Focus on Downscaling
Ayan Sengupta
Yash Goel
Tanmoy Chakraborty
34
0
0
02 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
39
0
0
02 May 2025
RayZer: A Self-supervised Large View Synthesis Model
Hanwen Jiang
Hao Tan
Peng Wang
Haian Jin
Yue Zhao
...
Kai Zhang
Fujun Luan
Kalyan Sunkavalli
Qixing Huang
Georgios Pavlakos
62
0
0
01 May 2025
ICQuant: Index Coding enables Low-bit LLM Quantization
Xinlin Li
Osama A. Hanna
Christina Fragouli
Suhas Diggavi
MQ
62
0
0
01 May 2025
Position: Foundation Models Need Digital Twin Representations
Yiqing Shen
Hao Ding
Lalithkumar Seenivasan
Tianmin Shu
Mathias Unberath
AI4CE
40
0
0
01 May 2025
Scalable Meta-Learning via Mixed-Mode Differentiation
Iurii Kemaev
Dan A Calian
Luisa M Zintgraf
Gregory Farquhar
H. V. Hasselt
57
0
0
01 May 2025
R&B: Domain Regrouping and Data Mixture Balancing for Efficient Foundation Model Training
Albert Ge
Tzu-Heng Huang
John Cooper
Avi Trost
Ziyi Chu
Satya Sai Srinath Namburi GNVV
Ziyang Cai
Kendall Park
Nicholas Roberts
Frederic Sala
53
0
0
01 May 2025
LENSLLM: Unveiling Fine-Tuning Dynamics for LLM Selection
Xinyue Zeng
Haohui Wang
Junhong Lin
Jun Wu
Tyler Cody
Dawei Zhou
91
0
0
01 May 2025
Improving Routing in Sparse Mixture of Experts with Graph of Tokens
Tam Minh Nguyen
Ngoc N. Tran
Khai Nguyen
Richard G. Baraniuk
MoE
59
0
0
01 May 2025
On the generalization of language models from in-context learning and finetuning: a controlled study
Andrew Kyle Lampinen
Arslan Chaudhry
Stephanie Chan
Cody Wild
Diane Wan
Alex Ku
Jorg Bornschein
Razvan Pascanu
Murray Shanahan
James L. McClelland
46
0
0
01 May 2025
Base Models Beat Aligned Models at Randomness and Creativity
Peter West
Christopher Potts
132
0
0
30 Apr 2025
COSMOS: Predictable and Cost-Effective Adaptation of LLMs
Jiayu Wang
Aws Albarghouthi
Frederic Sala
47
0
0
30 Apr 2025
1
2
3
4
...
17
18
19
Next