ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.00409
  4. Cited By
Deep Learning Scaling is Predictable, Empirically

Deep Learning Scaling is Predictable, Empirically

1 December 2017
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
ArXiv (abs)PDFHTML

Papers citing "Deep Learning Scaling is Predictable, Empirically"

50 / 372 papers shown
Title
Roles of Scaling and Instruction Tuning in Language Perception: Model
  vs. Human Attention
Roles of Scaling and Instruction Tuning in Language Perception: Model vs. Human Attention
Changjiang Gao
Shujian Huang
Jixing Li
Jiajun Chen
LRMALM
88
7
0
29 Oct 2023
ArTST: Arabic Text and Speech Transformer
ArTST: Arabic Text and Speech Transformer
Hawau Olamide Toyin
Amirbek Djanibekov
Ajinkya Kulkarni
Hanan Aldarmaki
92
10
0
25 Oct 2023
Pre-Training on Large-Scale Generated Docking Conformations with
  HelixDock to Unlock the Potential of Protein-ligand Structure Prediction
  Models
Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
Lihang Liu
Shanzhuo Zhang
Donglong He
Xianbin Ye
Jingbo Zhou
...
Fan Wang
Jingzhou He
Liang Zheng
Yonghui Li
Xiaomin Fang
AI4CE
88
9
0
21 Oct 2023
PrivImage: Differentially Private Synthetic Image Generation using
  Diffusion Models with Semantic-Aware Pretraining
PrivImage: Differentially Private Synthetic Image Generation using Diffusion Models with Semantic-Aware Pretraining
Kecen Li
Chen Gong
Zhixiang Li
Yuzhong Zhao
Xinwen Hou
Tianhao Wang
99
10
0
19 Oct 2023
Predicting Emergent Abilities with Infinite Resolution Evaluation
Predicting Emergent Abilities with Infinite Resolution Evaluation
Shengding Hu
Xin Liu
Xu Han
Xinrong Zhang
Chaoqun He
...
Ning Ding
Zebin Ou
Guoyang Zeng
Zhiyuan Liu
Maosong Sun
ELMLRM
69
15
0
05 Oct 2023
A Neural Scaling Law from Lottery Ticket Ensembling
A Neural Scaling Law from Lottery Ticket Ensembling
Ziming Liu
Max Tegmark
65
4
0
03 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor
  Cores
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
63
17
0
03 Oct 2023
Masked Autoencoders are Scalable Learners of Cellular Morphology
Masked Autoencoders are Scalable Learners of Cellular Morphology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton Earnshaw
82
15
0
27 Sep 2023
Cluster-based pruning techniques for audio data
Cluster-based pruning techniques for audio data
Boris Bergsma
Marta Brzezinska
O. Yazyev
Milos Cernak
64
3
0
21 Sep 2023
D3: Data Diversity Design for Systematic Generalization in Visual
  Question Answering
D3: Data Diversity Design for Systematic Generalization in Visual Question Answering
Amir Rahimi
Vanessa D’Amario
Moyuru Yamada
Kentaro Takemoto
Tomotake Sasaki
Xavier Boix
68
1
0
15 Sep 2023
Uncovering Neural Scaling Laws in Molecular Representation Learning
Uncovering Neural Scaling Laws in Molecular Representation Learning
Dingshuo Chen
Yanqiao Zhu
Jieyu Zhang
Yuanqi Du
Zhixun Li
Qiang Liu
Shu Wu
Liang Wang
69
19
0
15 Sep 2023
Pretraining on the Test Set Is All You Need
Pretraining on the Test Set Is All You Need
Rylan Schaeffer
118
30
0
13 Sep 2023
Explaining grokking through circuit efficiency
Explaining grokking through circuit efficiency
Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
97
55
0
05 Sep 2023
No Data Augmentation? Alternative Regularizations for Effective Training
  on Small Datasets
No Data Augmentation? Alternative Regularizations for Effective Training on Small Datasets
Lorenzo Brigato
Stavroula Mougiakakou
79
5
0
04 Sep 2023
International Governance of Civilian AI: A Jurisdictional Certification
  Approach
International Governance of Civilian AI: A Jurisdictional Certification Approach
Robert F. Trager
Benjamin Harack
Anka Reuel
A. Carnegie
Lennart Heim
...
R. Lall
Owen Larter
Seán Ó hÉigeartaigh
Simon Staffell
José Jaime Villalobos
77
20
0
29 Aug 2023
MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins
MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins
Tiberiu Sosea
Cornelia Caragea
84
12
0
17 Aug 2023
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code
  Generation
CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation
Dong Huang
Qi Bu
Yuhao Qing
Heming Cui
LRM
90
19
0
17 Aug 2023
Boosting Semi-Supervised Learning by bridging high and low-confidence
  predictions
Boosting Semi-Supervised Learning by bridging high and low-confidence predictions
Khanh-Binh Nguyen
Joon-Sung Yang
88
9
0
15 Aug 2023
Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for
  Severe Label Noise
Manifold DivideMix: A Semi-Supervised Contrastive Learning Framework for Severe Label Noise
Fahimeh Fooladgar
Minh-Son To
P. Mousavi
Purang Abolmaesumi
NoLa
71
6
0
13 Aug 2023
A vision transformer-based framework for knowledge transfer from
  multi-modal to mono-modal lymphoma subtyping models
A vision transformer-based framework for knowledge transfer from multi-modal to mono-modal lymphoma subtyping models
Bilel Guetarni
Féryal Windal
H. Benhabiles
Marianne Petit
Romain Dubois
Emmanuelle Leteurtre
Dominique Collard
DiffM
70
2
0
02 Aug 2023
A Theory for Emergence of Complex Skills in Language Models
A Theory for Emergence of Complex Skills in Language Models
Sanjeev Arora
Anirudh Goyal
LRM
98
87
0
29 Jul 2023
The semantic landscape paradigm for neural networks
The semantic landscape paradigm for neural networks
Shreyas Gokhale
94
2
0
18 Jul 2023
Scaling Laws for Imitation Learning in Single-Agent Games
Scaling Laws for Imitation Learning in Single-Agent Games
Jens Tuyls
Dhruv Madeka
Kari Torkkola
Dean Phillips Foster
Karthik Narasimhan
Sham Kakade
48
5
0
18 Jul 2023
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
An Empirical Study of Pre-trained Model Selection for Out-of-Distribution Generalization and Calibration
Hiroki Naganuma
Ryuichiro Hataya
Kotaro Yoshida
Ioannis Mitliagkas
OODD
175
3
0
17 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
154
125
0
06 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width
  Limit
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
125
36
0
30 Jun 2023
Towards Sybil Resilience in Decentralized Learning
Towards Sybil Resilience in Decentralized Learning
Thomas Werthenbach
J. Pouwelse
AAML
54
2
0
26 Jun 2023
Scaling MLPs: A Tale of Inductive Bias
Scaling MLPs: A Tale of Inductive Bias
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
110
39
0
23 Jun 2023
Textbooks Are All You Need
Textbooks Are All You Need
Suriya Gunasekar
Yi Zhang
J. Aneja
C. C. T. Mendes
Allison Del Giorno
...
Sébastien Bubeck
Ronen Eldan
Adam Tauman Kalai
Y. Lee
Yuan-Fang Li
AI4CEALMSyDa
108
411
0
20 Jun 2023
On the Joint Interaction of Models, Data, and Features
On the Joint Interaction of Models, Data, and Features
Yiding Jiang
Christina Baek
J. Zico Kolter
FedML
62
4
0
07 Jun 2023
Dynamic Sparsity Is Channel-Level Sparsity Learner
Dynamic Sparsity Is Channel-Level Sparsity Learner
Lu Yin
Gen Li
Meng Fang
Lijuan Shen
Tianjin Huang
Zhangyang Wang
Vlado Menkovski
Xiaolong Ma
Mykola Pechenizkiy
Shiwei Liu
94
21
0
30 May 2023
PubChemQC B3LYP/6-31G*//PM6 dataset: the Electronic Structures of 86
  Million Molecules using B3LYP/6-31G* calculations
PubChemQC B3LYP/6-31G*//PM6 dataset: the Electronic Structures of 86 Million Molecules using B3LYP/6-31G* calculations
Maho Nakata
Toshiyuki Maeda
84
30
0
29 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
152
64
0
22 May 2023
Lifelong Language Pretraining with Distribution-Specialized Experts
Lifelong Language Pretraining with Distribution-Specialized Experts
Wuyang Chen
Yan-Quan Zhou
Nan Du
Yanping Huang
James Laudon
Zhiwen Chen
Claire Cu
KELM
111
52
0
20 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
94
41
0
09 May 2023
Model-agnostic Measure of Generalization Difficulty
Model-agnostic Measure of Generalization Difficulty
Akhilan Boopathy
Kevin Liu
Jaedong Hwang
Shu Ge
Asaad Mohammedsaleh
Ila Fiete
138
4
0
01 May 2023
Predictability of Machine Learning Algorithms and Related Feature
  Extraction Techniques
Predictability of Machine Learning Algorithms and Related Feature Extraction Techniques
Yu-nan Dong
18
0
0
30 Apr 2023
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss
  Prediction across Scales
nanoLM: an Affordable LLM Pre-training Benchmark via Accurate Loss Prediction across Scales
Yiqun Yao
Siqi Fan
Xiusheng Huang
Xuezhi Fang
Xiang Li
...
Peng Han
Shuo Shang
Kang Liu
Aixin Sun
Yequan Wang
92
6
0
14 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the
  Cerebras Wafer-Scale Cluster
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoELRM
126
108
0
06 Apr 2023
Multi-annotator Deep Learning: A Probabilistic Framework for
  Classification
Multi-annotator Deep Learning: A Probabilistic Framework for Classification
M. Herde
Denis Huseljic
Bernhard Sick
80
9
0
05 Apr 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffMVLM
119
116
0
27 Mar 2023
$k$NN Prompting: Beyond-Context Learning with Calibration-Free Nearest
  Neighbor Inference
kkkNN Prompting: Beyond-Context Learning with Calibration-Free Nearest Neighbor Inference
Benfeng Xu
Quan Wang
Zhendong Mao
Yajuan Lyu
Qiaoqiao She
Yongdong Zhang
140
53
0
24 Mar 2023
The Quantization Model of Neural Scaling
The Quantization Model of Neural Scaling
Eric J. Michaud
Ziming Liu
Uzay Girit
Max Tegmark
MILM
125
89
0
23 Mar 2023
ExplainFix: Explainable Spatially Fixed Deep Networks
ExplainFix: Explainable Spatially Fixed Deep Networks
Alex Gaudio
Christos Faloutsos
A. Smailagic
P. Costa
A. Campilho
FAtt
70
3
0
18 Mar 2023
SemDeDup: Data-efficient learning at web-scale through semantic
  deduplication
SemDeDup: Data-efficient learning at web-scale through semantic deduplication
Amro Abbas
Kushal Tirumala
Daniel Simig
Surya Ganguli
Ari S. Morcos
79
183
0
16 Mar 2023
Supervised Feature Selection with Neuron Evolution in Sparse Neural
  Networks
Supervised Feature Selection with Neuron Evolution in Sparse Neural Networks
Zahra Atashgahi
Xuhao Zhang
Neil Kichler
Shiwei Liu
Lu Yin
Mykola Pechenizkiy
Raymond N. J. Veldhuis
Decebal Constantin Mocanu
80
11
0
10 Mar 2023
Kernel Regression with Infinite-Width Neural Networks on Millions of
  Examples
Kernel Regression with Infinite-Width Neural Networks on Millions of Examples
Ben Adlam
Jaehoon Lee
Shreyas Padhy
Zachary Nado
Jasper Snoek
82
12
0
09 Mar 2023
Robust mmWave Beamforming by Self-Supervised Hybrid Deep Learning
Robust mmWave Beamforming by Self-Supervised Hybrid Deep Learning
Fenghao Zhu
Bohao Wang
Zhaohui Yang
Chongwen Huang
Zhaoyang Zhang
G. C. Alexandropoulos
Chau Yuen
Merouane Debbah
45
12
0
09 Mar 2023
Spatio-Temporal Structure Consistency for Semi-supervised Medical Image
  Classification
Spatio-Temporal Structure Consistency for Semi-supervised Medical Image Classification
Wen-Ling Lei
Lei Liu
Li Liu
35
1
0
03 Mar 2023
A Meta-Learning Approach to Predicting Performance and Data Requirements
A Meta-Learning Approach to Predicting Performance and Data Requirements
Achin Jain
Gurumurthy Swaminathan
Paolo Favaro
Hao Yang
Avinash Ravichandran
...
Alessandro Achille
Onkar Dabeer
Bernt Schiele
A. Swaminathan
Stefano Soatto
76
8
0
02 Mar 2023
Previous
12345678
Next