Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.00409
Cited By
Deep Learning Scaling is Predictable, Empirically
1 December 2017
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Learning Scaling is Predictable, Empirically"
50 / 386 papers shown
Title
Superposition Yields Robust Neural Scaling
Yizhou Liu
Ziming Liu
Jeff Gore
MILM
24
0
0
15 May 2025
Parallel Scaling Law for Language Models
Mouxiang Chen
Binyuan Hui
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
Jianling Sun
Junyang Lin
Zhongxin Liu
MoE
LRM
37
0
0
15 May 2025
Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta
Hyunmo Kang
M. Wyart
38
0
0
11 May 2025
Extended Fiducial Inference for Individual Treatment Effects via Deep Neural Networks
Sehwan Kim
F. Liang
FedML
57
0
0
04 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
44
0
0
02 May 2025
Learning to Reason under Off-Policy Guidance
Jianhao Yan
Yafu Li
Zican Hu
Zhi Wang
Ganqu Cui
Xiaoye Qu
Yu Cheng
Yue Zhang
OffRL
LRM
44
0
0
21 Apr 2025
Multispectral airborne laser scanning for tree species classification: a benchmark of machine learning and deep learning algorithms
Josef Taher
Eric Hyyppä
Matti Hyyppä
Klaara Salolahti
Xiaowei Yu
...
Roope Näsi
H. Hyyti
Siiri Pyykkönen
Peilun Hu
Juha Hyyppa
26
0
0
19 Apr 2025
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Joseph Liu
Yoonsoo Nam
Xinyue Cui
Swabha Swayamdipta
56
0
0
13 Apr 2025
Hyperflows: Pruning Reveals the Importance of Weights
Eugen Barbulescu
Antonio Alexoaie
31
0
0
06 Apr 2025
Data Scaling Laws for End-to-End Autonomous Driving
Alexander Naumann
Xunjiang Gu
Tolga Dimlioglu
Mariusz Bojarski
Alperen Degirmenci
A. Popov
Devansh Bisla
Marco Pavone
Urs Muller
Boris Ivanovic
50
0
0
06 Apr 2025
Compression Laws for Large Language Models
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
31
0
0
06 Apr 2025
Geometric Median Matching for Robust k-Subset Selection from Noisy Data
Anish Acharya
Sujay Sanghavi
Alexandros G. Dimakis
Inderjit S Dhillon
AAML
62
0
0
01 Apr 2025
Force-Free Molecular Dynamics Through Autoregressive Equivariant Networks
Fabian L. Thiemann
Thiago Reschützegger
Massimiliano Esposito
Tseden Taddese
Juan D. Olarte-Plata
Fausto Martelli
AI4CE
52
0
0
31 Mar 2025
Scaling Laws of Synthetic Data for Language Models
Zeyu Qin
Qingxiu Dong
Xingxing Zhang
Li Dong
Xiaolong Huang
...
Hany Awadalla
Yi R. Fung
Weizhu Chen
Minhao Cheng
Furu Wei
SyDa
81
2
0
25 Mar 2025
Improving Quantization with Post-Training Model Expansion
Giuseppe Franco
Pablo Monteagudo-Lago
Ian Colbert
Nicholas J. Fraser
Michaela Blott
MQ
65
2
0
21 Mar 2025
A Multi-Power Law for Loss Curve Prediction Across Learning Rate Schedules
Kairong Luo
Haodong Wen
Shengding Hu
Zhenbo Sun
Zhiyuan Liu
Maosong Sun
Kaifeng Lyu
Wenguang Chen
CLL
67
2
0
17 Mar 2025
Scale Efficient Training for Large Datasets
Qing Zhou
Junyu Gao
Qi Wang
DD
78
0
0
17 Mar 2025
Robustness Tokens: Towards Adversarial Robustness of Transformers
Brian Pulfer
Yury Belousov
S. Voloshynovskiy
AAML
45
0
0
13 Mar 2025
Cost-Optimal Grouped-Query Attention for Long-Context Modeling
Yuxiao Chen
Yutong Wu
Chenyang Song
Zhiyuan Liu
Maosong Sun
Xu Han
Zhiyuan Liu
Maosong Sun
73
0
0
12 Mar 2025
Communication-Efficient Language Model Training Scales Reliably and Robustly: Scaling Laws for DiLoCo
Zachary B. Charles
Gabriel Teston
Lucio Dery
Keith Rush
Nova Fallen
Zachary Garrett
Arthur Szlam
Arthur Douillard
193
0
0
12 Mar 2025
RouterEval: A Comprehensive Benchmark for Routing LLMs to Explore Model-level Scaling Up in LLMs
Zhongzhan Huang
Guoming Ling
Vincent S. Liang
Yupei Lin
Yandong Chen
Shanshan Zhong
Hefeng Wu
LRM
54
2
0
08 Mar 2025
Hebbian learning the local structure of language
P. Myles Eugenio
63
0
0
03 Mar 2025
Scaling Law Phenomena Across Regression Paradigms: Multiple and Kernel Approaches
Yifang Chen
Xuyang Guo
Xiaoyu Li
Yingyu Liang
Zhenmei Shi
Zhao Song
73
3
0
03 Mar 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
38
0
0
28 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws
Margaret Li
Sneha Kudugunta
Luke Zettlemoyer
69
2
0
26 Feb 2025
Distributional Scaling Laws for Emergent Capabilities
Rosie Zhao
Tian Qin
David Alvarez-Melis
Sham Kakade
Naomi Saphra
LRM
39
1
0
24 Feb 2025
Model-agnostic Coreset Selection via LLM-based Concept Bottlenecks
Akshay Mehra
Trisha Mittal
Subhadra Gopalakrishnan
Joshua Kimball
45
0
0
23 Feb 2025
Factual Inconsistency in Data-to-Text Generation Scales Exponentially with LLM Size: A Statistical Validation
Joy Mahapatra
Soumyajit Roy
Utpal Garain
HILM
ALM
88
0
0
17 Feb 2025
Privacy-Preserving Dataset Combination
Keren Fuentes
Mimee Xu
Irene Chen
43
0
0
09 Feb 2025
Top Ten Challenges Towards Agentic Neural Graph Databases
Jiaxin Bai
Zehua Wang
Yukun Zhou
Hang Yin
Weizhi Fei
...
Binhang Yuan
Wei Wang
Lei Chen
Xiaofang Zhou
Yangqiu Song
124
1
0
24 Jan 2025
Geometric Median (GM) Matching for Robust Data Pruning
Anish Acharya
Inderjit S Dhillon
Sujay Sanghavi
AAML
59
0
0
20 Jan 2025
Large Language Monkeys: Scaling Inference Compute with Repeated Sampling
Bradley Brown
Jordan Juravsky
Ryan Ehrlich
Ronald Clark
Quoc V. Le
Christopher Ré
Azalia Mirhoseini
ALM
LRM
95
224
0
03 Jan 2025
Towards Precise Scaling Laws for Video Diffusion Transformers
Yuanyang Yin
Yaqi Zhao
Mingwu Zheng
Ke Lin
Jiarong Ou
...
Pengfei Wan
Di Zhang
Baoqun Yin
Wentao Zhang
Kun Gai
124
2
0
03 Jan 2025
VisDoM: Multi-Document QA with Visually Rich Elements Using Multimodal Retrieval-Augmented Generation
Manan Suri
Puneet Mathur
Franck Dernoncourt
Kanika Goswami
Ryan A. Rossi
Dinesh Manocha
95
3
0
14 Dec 2024
Implicit Delta Learning of High Fidelity Neural Network Potentials
Stephan Thaler
Cristian Gabellini
Nikhil Shenoy
Prudencio Tossou
AI4CE
90
0
0
08 Dec 2024
Scaling Laws for Online Advertisement Retrieval
Yunli Wang
Zhiyong Yang
Z. Zhang
Zhiqiang Wang
Jian Yang
Shiyang Wen
Peng Jiang
Kun Gai
OffRL
75
4
0
20 Nov 2024
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
Alex Havrilla
Wenjing Liao
39
8
0
11 Nov 2024
Scaling Laws for Pre-training Agents and World Models
Tim Pearce
Tabish Rashid
Dave Bignell
Raluca Georgescu
Sam Devlin
Katja Hofmann
LM&Ro
42
6
0
07 Nov 2024
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation
Ziwei Zhan
Wenkuan Zhao
Yuanqing Li
Weijie Liu
Xiaoxi Zhang
Chee Wei Tan
Chuan Wu
Deke Guo
Xu Chen
MoE
48
1
0
04 Nov 2024
Data movement limits to frontier model training
Ege Erdil
David Schneider-Joseph
41
1
0
02 Nov 2024
Does equivariance matter at scale?
Johann Brehmer
S. Behrends
P. D. Haan
Taco S. Cohen
55
11
0
30 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
83
8
0
29 Oct 2024
A Simple Model of Inference Scaling Laws
Noam Levi
LRM
32
6
0
21 Oct 2024
Transfer Learning on Multi-Dimensional Data: A Novel Approach to Neural Network-Based Surrogate Modeling
Adrienne M. Propp
Daniel M. Tartakovsky
AI4CE
33
2
0
16 Oct 2024
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao
Chao-Han Huck Yang
Renhe Jiang
Keli Zhang
Ming Jin
Shirui Pan
AI4TS
AI4CE
47
7
0
16 Oct 2024
MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping With a Dynamic and Static Object Discriminator
Taozhe Li
Wei Sun
34
0
0
14 Oct 2024
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
Hai Huang
Randall Balestriero
35
0
0
13 Oct 2024
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
Roman Worschech
B. Rosenow
41
0
0
11 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
53
9
0
11 Oct 2024
Scaling Laws For Diffusion Transformers
Zhengyang Liang
Hao He
Ceyuan Yang
Bo Dai
35
9
0
10 Oct 2024
1
2
3
4
5
6
7
8
Next