Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.00409
Cited By
Deep Learning Scaling is Predictable, Empirically
1 December 2017
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Deep Learning Scaling is Predictable, Empirically"
50 / 386 papers shown
Title
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
27
3
0
04 Sep 2023
International Governance of Civilian AI: A Jurisdictional Certification Approach
Robert F. Trager
Benjamin Harack
Anka Reuel
A. Carnegie
Lennart Heim
...
R. Lall
Owen Larter
Seán Ó hÉigeartaigh
Simon Staffell
José Jaime Villalobos
26
20
0
29 Aug 2023
MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins
Tiberiu Sosea
Cornelia Caragea
16
12
0
17 Aug 2023
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation
Dong Huang
Qi Bu
Yuhao Qing
Heming Cui
LRM
32
16
0
17 Aug 2023
Boosting Semi-Supervised Learning by bridging high and low-confidence predictions
Khanh-Binh Nguyen
Joon-Sung Yang
27
7
0
15 Aug 2023
Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise
Fahimeh Fooladgar
Minh Nguyen Nhat To
P. Mousavi
Purang Abolmaesumi
NoLa
37
4
0
13 Aug 2023
A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models
Bilel Guetarni
Féryal Windal
H. Benhabiles
Marianne Petit
Romain Dubois
Emmanuelle Leteurtre
Dominique Collard
DiffM
25
2
0
02 Aug 2023
A Theory for Emergence of Complex Skills in Language Models
Sanjeev Arora
Anirudh Goyal
LRM
29
73
0
29 Jul 2023
The semantic landscape paradigm for neural networks
Shreyas Gokhale
21
2
0
18 Jul 2023
Scaling Laws for Imitation Learning in Single-Agent Games
Jens Tuyls
Dhruv Madeka
Kari Torkkola
Dean Phillips Foster
Karthik R. Narasimhan
Sham Kakade
32
4
0
18 Jul 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
95
1
0
17 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
44
118
0
06 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
35
31
0
30 Jun 2023
Towards Sybil Resilience in Decentralized Learning
Thomas Werthenbach
J. Pouwelse
AAML
15
2
0
26 Jun 2023
Beyond Scale: the Diversity Coefficient as a Data Quality Metric Demonstrates LLMs are Pre-trained on Formally Diverse Data
Alycia Lee
Brando Miranda
Sudharsan Sundar
Sanmi Koyejo
42
6
0
24 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
37
38
0
23 Jun 2023
Textbooks Are All You Need
Suriya Gunasekar
Yi Zhang
J. Aneja
C. C. T. Mendes
Allison Del Giorno
...
Sébastien Bubeck
Ronen Eldan
Adam Tauman Kalai
Y. Lee
Yuan-Fang Li
AI4CE
ALM
SyDa
38
392
0
20 Jun 2023
On the Joint Interaction of Models, Data, and Features
Yiding Jiang
Christina Baek
J. Zico Kolter
FedML
28
4
0
07 Jun 2023
GPT4Image: Large Pre-trained Models Help Vision Models Learn Better on Perception Task
Ning Ding
Yehui Tang
Zhongqian Fu
Chaoting Xu
Kai Han
Yunhe Wang
MLLM
VLM
37
2
0
01 Jun 2023
Dynamic Sparsity Is Channel-Level Sparsity Learner
Lu Yin
Gen Li
Meng Fang
Lijuan Shen
Tianjin Huang
Zhangyang Wang
Vlado Menkovski
Xiaolong Ma
Mykola Pechenizkiy
Shiwei Liu
33
20
0
30 May 2023
PubChemQC B3LYP/6-31G*//PM6 dataset: the Electronic Structures of 86 Million Molecules using B3LYP/6-31G* calculations
Maho Nakata
Toshiyuki Maeda
16
28
0
29 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
42
58
0
22 May 2023
Lifelong Language Pretraining with Distribution-Specialized Experts
Wuyang Chen
Yan-Quan Zhou
Nan Du
Yanping Huang
James Laudon
Z. Chen
Claire Cu
KELM
29
48
0
20 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
26
40
0
09 May 2023
Model-agnostic Measure of Generalization Difficulty
Akhilan Boopathy
Kevin Liu
Jaedong Hwang
Shu Ge
Asaad Mohammedsaleh
Ila Fiete
80
4
0
01 May 2023
Predictability of Machine Learning Algorithms and Related Feature Extraction Techniques
Yu-nan Dong
6
0
0
30 Apr 2023
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
50
396
0
28 Apr 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
33
6
0
14 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoE
LRM
25
99
0
06 Apr 2023
Multi-annotator Deep Learning: A Probabilistic Framework for Classification
M. Herde
Denis Huseljic
Bernhard Sick
38
9
0
05 Apr 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffM
VLM
38
107
0
27 Mar 2023
k
k
k
NN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference
Benfeng Xu
Quan Wang
Zhendong Mao
Yajuan Lyu
Qiaoqiao She
Yongdong Zhang
104
52
0
24 Mar 2023
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
27
77
0
23 Mar 2023
ExplainFix: Explainable Spatially Fixed Deep Networks
Alex Gaudio
Christos Faloutsos
A. Smailagic
P. Costa
A. Campilho
FAtt
35
3
0
18 Mar 2023
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Amro Abbas
Kushal Tirumala
Daniel Simig
Surya Ganguli
Ari S. Morcos
31
164
0
16 Mar 2023
Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks
Zahra Atashgahi
Xuhao Zhang
Neil Kichler
Shiwei Liu
Lu Yin
Mykola Pechenizkiy
Raymond N. J. Veldhuis
Decebal Constantin Mocanu
18
10
0
10 Mar 2023
Kernel Regression with Infinite-Width Neural Networks on Millions of Examples
Ben Adlam
Jaehoon Lee
Shreyas Padhy
Zachary Nado
Jasper Snoek
26
11
0
09 Mar 2023
Robust mmWave Beamforming by Self-Supervised Hybrid Deep Learning
Fenghao Zhu
Bohao Wang
Zhaohui Yang
Chongwen Huang
Zhaoyang Zhang
G. C. Alexandropoulos
Chau Yuen
Merouane Debbah
24
12
0
09 Mar 2023
Spatio-Temporal Structure Consistency for Semi-supervised Medical Image Classification
Wen-Ling Lei
Lei Liu
Li Liu
21
1
0
03 Mar 2023
A Meta-Learning Approach to Predicting Performance and Data Requirements
Achin Jain
Gurumurthy Swaminathan
Paolo Favaro
Hao Yang
Avinash Ravichandran
...
Alessandro Achille
Onkar Dabeer
Bernt Schiele
A. Swaminathan
Stefano Soatto
37
8
0
02 Mar 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
58
12,368
0
27 Feb 2023
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Ghada Sokar
Rishabh Agarwal
Pablo Samuel Castro
Utku Evci
CLL
51
89
0
24 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
38
29
0
19 Feb 2023
Cliff-Learning
T. T. Wang
I. Zablotchi
Nir Shavit
Jonathan S. Rosenfeld
44
0
0
14 Feb 2023
Data pruning and neural scaling laws: fundamental limitations of score-based algorithms
Fadhel Ayed
Soufiane Hayou
14
9
0
14 Feb 2023
Ten Lessons We Have Learned in the New "Sparseland": A Short Handbook for Sparse Neural Network Researchers
Shiwei Liu
Zhangyang Wang
32
30
0
06 Feb 2023
Scaling Laws for Hyperparameter Optimization
Arlind Kadra
Maciej Janowski
Martin Wistuba
Josif Grabocka
28
9
0
01 Feb 2023
A Closer Look at Few-shot Classification Again
Xu Luo
Hao Wu
Ji Zhang
Lianli Gao
Jing Xu
Jingkuan Song
24
48
0
28 Jan 2023
Scaling Laws for Generative Mixed-Modal Language Models
Armen Aghajanyan
L. Yu
Alexis Conneau
Wei-Ning Hsu
Karen Hambardzumyan
Susan Zhang
Stephen Roller
Naman Goyal
Omer Levy
Luke Zettlemoyer
MoE
VLM
19
104
0
10 Jan 2023
The case for 4-bit precision: k-bit Inference Scaling Laws
Tim Dettmers
Luke Zettlemoyer
MQ
27
218
0
19 Dec 2022
Previous
1
2
3
4
5
6
7
8
Next