ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.00409
  4. Cited By
Deep Learning Scaling is Predictable, Empirically

Deep Learning Scaling is Predictable, Empirically

1 December 2017
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
ArXiv (abs)PDFHTML

Papers citing "Deep Learning Scaling is Predictable, Empirically"

50 / 372 papers shown
Title
Is Synthetic Data all We Need? Benchmarking the Robustness of Models
  Trained with Synthetic Images
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Krishnakant Singh
Thanush Navaratnam
Jannik Holmer
Simone Schaub-Meyer
Stefan Roth
DiffM
99
21
0
30 May 2024
Scaling Laws for the Value of Individual Data Points in Machine Learning
Scaling Laws for the Value of Individual Data Points in Machine Learning
Ian Covert
Wenlong Ji
Tatsunori Hashimoto
James Zou
TDI
101
8
0
30 May 2024
Phase Transitions in the Output Distribution of Large Language Models
Phase Transitions in the Output Distribution of Large Language Models
Julian Arnold
Flemming Holtorf
Frank Schafer
Niels Lörch
74
2
0
27 May 2024
gzip Predicts Data-dependent Scaling Laws
gzip Predicts Data-dependent Scaling Laws
Rohan Pandey
82
11
0
26 May 2024
Small Language Models for Application Interactions: A Case Study
Small Language Models for Application Interactions: A Case Study
Beibin Li
Yi Zhang
Sébastien Bubeck
Jeevan Pathuri
Ishai Menache
89
4
0
23 May 2024
Unraveling overoptimism and publication bias in ML-driven science
Unraveling overoptimism and publication bias in ML-driven science
Pouria Saidi
Gautam Dasarathy
Visar Berisha
87
2
0
23 May 2024
Super Tiny Language Models
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
115
6
0
23 May 2024
The Platonic Representation Hypothesis
The Platonic Representation Hypothesis
Minyoung Huh
Brian Cheung
Tongzhou Wang
Phillip Isola
142
142
0
13 May 2024
Separable Power of Classical and Quantum Learning Protocols Through the
  Lens of No-Free-Lunch Theorem
Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem
Xinbiao Wang
Yuxuan Du
Kecheng Liu
Yong Luo
Bo Du
Dacheng Tao
64
1
0
12 May 2024
Statistical divergences in high-dimensional hypothesis testing and a
  modern technique for estimating them
Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them
Jeremy J.H. Wilkinson
Christopher G. Lester
50
0
0
10 May 2024
KAN: Kolmogorov-Arnold Networks
KAN: Kolmogorov-Arnold Networks
Ziming Liu
Yixuan Wang
Sachin Vaidya
Fabian Ruehle
James Halverson
Marin Soljacic
Thomas Y. Hou
Max Tegmark
322
602
0
30 Apr 2024
The Simpler The Better: An Entropy-Based Importance Metric To Reduce
  Neural Networks' Depth
The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth
Victor Quétu
Zhu Liao
Enzo Tartaglione
115
4
0
27 Apr 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and
  Texts
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
99
9
0
26 Apr 2024
NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer
NEPENTHE: Entropy-Based Pruning as a Neural Network Depth's Reducer
Zhu Liao
Victor Quétu
Van-Tam Nguyen
Enzo Tartaglione
73
2
0
24 Apr 2024
A Concise Tiling Strategy for Preserving Spatial Context in Earth
  Observation Imagery
A Concise Tiling Strategy for Preserving Spatial Context in Earth Observation Imagery
Ellianna Abrahams
Tasha Snow
Matthew R. Siegfried
Fernando Pérez
73
1
0
16 Apr 2024
Decoupled Weight Decay for Any $p$ Norm
Decoupled Weight Decay for Any ppp Norm
N. Outmezguine
Noam Levi
86
3
0
16 Apr 2024
How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random
  Hierarchy Model
How Deep Networks Learn Sparse and Hierarchical Data: the Sparse Random Hierarchy Model
Umberto M. Tomasini
Matthieu Wyart
BDL
113
7
0
16 Apr 2024
Masked Autoencoders for Microscopy are Scalable Learners of Cellular
  Biology
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton Earnshaw
74
35
0
16 Apr 2024
How Much Data are Enough? Investigating Dataset Requirements for
  Patch-Based Brain MRI Segmentation Tasks
How Much Data are Enough? Investigating Dataset Requirements for Patch-Based Brain MRI Segmentation Tasks
Dongang Wang
Peilin Liu
Hengrui Wang
H. Beadnall
K. Kyle
...
Tom Weidong Cai
Wanli Ouyang
Fernando Calamante
Michael Barnett
Chenyu Wang
68
2
0
04 Apr 2024
Transfer Learning from Whisper for Microscopic Intelligibility
  Prediction
Transfer Learning from Whisper for Microscopic Intelligibility Prediction
Paul Best
Santiago Cuervo
R. Marxer
73
3
0
02 Apr 2024
Scaling Properties of Speech Language Models
Scaling Properties of Speech Language Models
Santiago Cuervo
R. Marxer
97
11
0
31 Mar 2024
Scaling Laws For Dense Retrieval
Scaling Laws For Dense Retrieval
Yan Fang
Jingtao Zhan
Qingyao Ai
Jiaxin Mao
Weihang Su
Jia Chen
Yiqun Liu
193
10
0
27 Mar 2024
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Data Mixing Laws: Optimizing Data Mixtures by Predicting Language Modeling Performance
Jiasheng Ye
Peiju Liu
Tianxiang Sun
Yunhua Zhou
Jun Zhan
Xipeng Qiu
147
76
0
25 Mar 2024
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped
  Robot
GeRM: A Generalist Robotic Model with Mixture-of-experts for Quadruped Robot
Wenxuan Song
Han Zhao
Pengxiang Ding
Can Cui
Shangke Lyu
Yaning Fan
Donglin Wang
OffRL
120
14
0
20 Mar 2024
Language models scale reliably with over-training and on downstream
  tasks
Language models scale reliably with over-training and on downstream tasks
S. Gadre
Georgios Smyrnis
Vaishaal Shankar
Suchin Gururangan
Mitchell Wortsman
...
Y. Carmon
Achal Dave
Reinhard Heckel
Niklas Muennighoff
Ludwig Schmidt
ALMELMLRM
183
48
0
13 Mar 2024
Unraveling the Mystery of Scaling Laws: Part I
Unraveling the Mystery of Scaling Laws: Part I
Hui Su
Zhi Tian
Xiaoyu Shen
Xunliang Cai
119
21
0
11 Mar 2024
How much data do you need? Part 2: Predicting DL class specific training
  dataset sizes
How much data do you need? Part 2: Predicting DL class specific training dataset sizes
Thomas Mühlenstädt
Jelena Frtunikj
48
2
0
10 Mar 2024
Not just Birds and Cars: Generic, Scalable and Explainable Models for
  Professional Visual Recognition
Not just Birds and Cars: Generic, Scalable and Explainable Models for Professional Visual Recognition
Junde Wu
Jiayuan Zhu
Min Xu
Yueming Jin
70
0
0
08 Mar 2024
When Scaling Meets LLM Finetuning: The Effect of Data, Model and
  Finetuning Method
When Scaling Meets LLM Finetuning: The Effect of Data, Model and Finetuning Method
Biao Zhang
Zhongtao Liu
Colin Cherry
Orhan Firat
LRM
116
159
0
27 Feb 2024
Model Compression and Efficient Inference for Large Language Models: A
  Survey
Model Compression and Efficient Inference for Large Language Models: A Survey
Wenxiao Wang
Wei Chen
Yicong Luo
Yongliu Long
Zhengkai Lin
Liye Zhang
Binbin Lin
Deng Cai
Xiaofei He
MQ
116
58
0
15 Feb 2024
A Tale of Tails: Model Collapse as a Change of Scaling Laws
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob
Yunzhen Feng
Pu Yang
Francois Charton
Julia Kempe
76
75
0
10 Feb 2024
Pretrained Generative Language Models as General Learning Frameworks for
  Sequence-Based Tasks
Pretrained Generative Language Models as General Learning Frameworks for Sequence-Based Tasks
Ben Fauber
63
2
0
08 Feb 2024
A Resource Model For Neural Scaling Law
A Resource Model For Neural Scaling Law
Jinyeop Song
Ziming Liu
Max Tegmark
Jeff Gore
160
4
0
07 Feb 2024
Scaling laws for learning with real and surrogate data
Scaling laws for learning with real and surrogate data
Ayush Jain
Andrea Montanari
Eren Sasoglu
109
14
0
06 Feb 2024
A Dynamical Model of Neural Scaling Laws
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
148
44
0
02 Feb 2024
Learning to Manipulate under Limited Information
Learning to Manipulate under Limited Information
Wesley H. Holliday
Alexander Kristoffersen
Eric Pacuit
200
4
0
29 Jan 2024
Computing in the Era of Large Generative Models: From Cloud-Native to
  AI-Native
Computing in the Era of Large Generative Models: From Cloud-Native to AI-Native
Yao Lu
Song Bian
Lequn Chen
Yongjun He
Yulong Hui
...
Huanchen Zhang
Minjia Zhang
Qizhen Zhang
Tianyi Zhou
Danyang Zhuo
93
7
0
17 Jan 2024
Vlogger: Make Your Dream A Vlog
Vlogger: Make Your Dream A Vlog
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGenDiffM
83
39
0
17 Jan 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRMALM
209
381
0
05 Jan 2024
Predicting Infant Brain Connectivity with Federated Multi-Trajectory
  GNNs using Scarce Data
Predicting Infant Brain Connectivity with Federated Multi-Trajectory GNNs using Scarce Data
Michalis Pistos
Gang Li
Weili Lin
Dinggang Shen
I. Rekik
58
0
0
01 Jan 2024
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Beyond Chinchilla-Optimal: Accounting for Inference in Language Model Scaling Laws
Nikhil Sardana
Jacob P. Portes
Sasha Doubov
Jonathan Frankle
LRM
420
88
0
31 Dec 2023
Tell, don't show: Declarative facts influence how LLMs generalize
Tell, don't show: Declarative facts influence how LLMs generalize
Alexander Meinke
Owain Evans
71
7
0
12 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIPVLM
82
68
0
07 Dec 2023
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
The Efficiency Spectrum of Large Language Models: An Algorithmic Survey
Tianyu Ding
Tianyi Chen
Haidong Zhu
Jiachen Jiang
Yiqi Zhong
Jinxin Zhou
Guangzhi Wang
Zhihui Zhu
Ilya Zharkov
Luming Liang
126
24
0
01 Dec 2023
A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets
  given Small Pilot Data
A Probabilistic Method to Predict Classifier Accuracy on Larger Datasets given Small Pilot Data
Ethan Harvey
Wansu Chen
David M Kent
Michael C. Hughes
48
1
0
29 Nov 2023
Token-Level Adaptation of LoRA Adapters for Downstream Task
  Generalization
Token-Level Adaptation of LoRA Adapters for Downstream Task Generalization
Joshua Belofsky
MoMe
64
13
0
17 Nov 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
85
2
0
06 Nov 2023
Robust Data Pruning under Label Noise via Maximizing Re-labeling
  Accuracy
Robust Data Pruning under Label Noise via Maximizing Re-labeling Accuracy
Dongmin Park
Seola Choi
Doyoung Kim
Hwanjun Song
Jae-Gil Lee
NoLa
127
22
0
02 Nov 2023
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct
  Air Capture
The Open DAC 2023 Dataset and Challenges for Sorbent Discovery in Direct Air Capture
Anuroop Sriram
Sihoon Choi
Xiaohan Yu
Logan M. Brabson
Abhishek Das
Zachary W. Ulissi
Matthew Uyttendaele
A. Medford
D. Sholl
AI4CE
83
44
0
01 Nov 2023
Large Trajectory Models are Scalable Motion Predictors and Planners
Large Trajectory Models are Scalable Motion Predictors and Planners
Q. Sun
Shiduo Zhang
Danjiao Ma
Jingzhe Shi
Derun Li
Simian Luo
Yu Wang
Ningyi Xu
Guangzhi Cao
Hang Zhao
83
19
0
30 Oct 2023
Previous
12345678
Next