ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.00409
  4. Cited By
Deep Learning Scaling is Predictable, Empirically

Deep Learning Scaling is Predictable, Empirically

1 December 2017
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
ArXiv (abs)PDFHTML

Papers citing "Deep Learning Scaling is Predictable, Empirically"

50 / 372 papers shown
Title
Scaling Laws for Online Advertisement Retrieval
Scaling Laws for Online Advertisement Retrieval
Yunli Wang
Zhiyong Yang
Zheng Zhang
Zhiqiang Wang
Jian Yang
Shiyang Wen
Peng Jiang
Kun Gai
OffRL
122
5
0
20 Nov 2024
Understanding Scaling Laws with Statistical and Approximation Theory for
  Transformer Neural Networks on Intrinsically Low-dimensional Data
Understanding Scaling Laws with Statistical and Approximation Theory for Transformer Neural Networks on Intrinsically Low-dimensional Data
Alex Havrilla
Wenjing Liao
101
12
0
11 Nov 2024
Scaling Laws for Pre-training Agents and World Models
Scaling Laws for Pre-training Agents and World Models
Tim Pearce
Tabish Rashid
Dave Bignell
Raluca Georgescu
Sam Devlin
Katja Hofmann
LM&Ro
80
7
0
07 Nov 2024
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained
  Aggregation
FedMoE-DA: Federated Mixture of Experts via Domain Aware Fine-grained Aggregation
Ziwei Zhan
Wenkuan Zhao
Yuanqing Li
Weijie Liu
Xiaoxi Zhang
Chee Wei Tan
Chuan Wu
Deke Guo
Xu Chen
MoE
104
2
0
04 Nov 2024
Data movement limits to frontier model training
Data movement limits to frontier model training
Ege Erdil
David Schneider-Joseph
92
1
0
02 Nov 2024
Does equivariance matter at scale?
Does equivariance matter at scale?
Johann Brehmer
S. Behrends
P. D. Haan
Taco S. Cohen
106
15
0
30 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
194
18
0
29 Oct 2024
A Simple Model of Inference Scaling Laws
A Simple Model of Inference Scaling Laws
Noam Levi
LRM
78
13
0
21 Oct 2024
Transfer Learning on Multi-Dimensional Data: A Novel Approach to Neural Network-Based Surrogate Modeling
Transfer Learning on Multi-Dimensional Data: A Novel Approach to Neural Network-Based Surrogate Modeling
Adrienne M. Propp
Daniel M. Tartakovsky
AI4CE
85
2
0
16 Oct 2024
Towards Neural Scaling Laws for Time Series Foundation Models
Towards Neural Scaling Laws for Time Series Foundation Models
Qingren Yao
Chao-Han Huck Yang
Renhe Jiang
Yuxuan Liang
Ming Jin
Shirui Pan
AI4TSAI4CE
165
9
0
16 Oct 2024
MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and
  Mapping With a Dynamic and Static Object Discriminator
MLP-SLAM: Multilayer Perceptron-Based Simultaneous Localization and Mapping With a Dynamic and Static Object Discriminator
Taozhe Li
Wei Sun
61
0
0
14 Oct 2024
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
ALLoRA: Adaptive Learning Rate Mitigates LoRA Fatal Flaws
Hai Huang
Randall Balestriero
65
0
0
13 Oct 2024
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data
  Spectra
Analyzing Neural Scaling Laws in Two-Layer Networks with Power-Law Data Spectra
Roman Worschech
B. Rosenow
131
0
0
11 Oct 2024
Scaling Laws for Predicting Downstream Performance in LLMs
Scaling Laws for Predicting Downstream Performance in LLMs
Yangyi Chen
Binxuan Huang
Yifan Gao
Zhengyang Wang
Jingfeng Yang
Heng Ji
LRM
151
12
0
11 Oct 2024
Scaling Laws For Diffusion Transformers
Scaling Laws For Diffusion Transformers
Zhengyang Liang
Hao He
Ceyuan Yang
Bo Dai
89
14
0
10 Oct 2024
Neural Scaling Laws of Deep ReLU and Deep Operator Network: A
  Theoretical Study
Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study
Hao Liu
Zecheng Zhang
Wenjing Liao
Hayden Schaeffer
80
1
0
01 Oct 2024
Learning non-Gaussian spatial distributions via Bayesian transport maps with parametric shrinkage
Learning non-Gaussian spatial distributions via Bayesian transport maps with parametric shrinkage
Anirban Chakraborty
Matthias Katzfuss
OT
109
1
0
28 Sep 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
146
17
0
26 Sep 2024
Efficient Feature Interactions with Transformers: Improving User
  Spending Propensity Predictions in Gaming
Efficient Feature Interactions with Transformers: Improving User Spending Propensity Predictions in Gaming
Ved Prakash
Kartavya Kothari
AI4TS
96
0
0
25 Sep 2024
Anisotropic Diffusion Probabilistic Model for Imbalanced Image
  Classification
Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification
Jingyu Kong
Yuan Guo
Yu Wang
Yuping Duan
DiffMMedIm
83
0
0
22 Sep 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging
  Framework And Methods From Neuroscience
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
101
2
0
22 Aug 2024
Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?
Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?
Qian Ma
Haitao Mao
Jingzhe Liu
Zhehua Zhang
Chunlin Feng
Yu Song
Yihan Shao
Yao Ma
96
3
0
20 Aug 2024
Scaling Law with Learning Rate Annealing
Scaling Law with Learning Rate Annealing
Howe Tissue
Venus Wang
Lu Wang
108
9
0
20 Aug 2024
ScalingFilter: Assessing Data Quality through Inverse Utilization of
  Scaling Laws
ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
Ruihang Li
Yixuan Wei
Miaosen Zhang
Nenghai Yu
Han Hu
Houwen Peng
80
4
0
15 Aug 2024
Towards flexible perception with visual memory
Towards flexible perception with visual memory
Robert Geirhos
P. Jaini
Austin Stone
Sourabh Medapati
Xi Yi
G. Toderici
Abhijit Ogale
Jonathon Shlens
88
1
0
15 Aug 2024
A Survey on Model MoErging: Recycling and Routing Among Specialized
  Experts for Collaborative Learning
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
Prateek Yadav
Colin Raffel
Mohammed Muqeeth
Lucas Caccia
Haokun Liu
Tianlong Chen
Joey Tianyi Zhou
Leshem Choshen
Alessandro Sordoni
MoMe
120
25
0
13 Aug 2024
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in
  Vision-Language AI
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI
Robert Wolfe
Aayushi Dangol
Alexis Hiniker
Bill Howe
80
6
0
04 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
79
3
0
01 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
84
25
0
31 Jul 2024
Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge
Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge
Timothy Dai
Kate Maher
Z. Perzan
104
4
0
30 Jul 2024
Analyzing and reducing the synthetic-to-real transfer gap in Music
  Information Retrieval: the task of automatic drum transcription
Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription
Mickaël Zehren
Marco Alunno
Paolo Bientinesi
79
1
0
29 Jul 2024
Seamless Website Fingerprinting in Multiple Environments
Seamless Website Fingerprinting in Multiple Environments
Chuxu Song
Zining Fan
Hao Wang
Richard Martin
79
1
0
28 Jul 2024
Understanding the Interplay of Scale, Data, and Bias in Language Models:
  A Case Study with BERT
Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT
Muhammad Ali
Swetasudha Panda
Qinlan Shen
Michael Wick
Ari Kobren
MILM
99
3
0
25 Jul 2024
Scaling Training Data with Lossy Image Compression
Scaling Training Data with Lossy Image Compression
Katherine L. Mentzer
Andrea Montanari
50
0
0
25 Jul 2024
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual
  Pre-training of Language Models
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
Jiawei Gu
Zacc Yang
Chuanghao Ding
Rui Zhao
Fei Tan
CLL
133
9
0
24 Jul 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
81
0
0
04 Jul 2024
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
Zhen Tan
Daize Dong
Xinyu Zhao
Jie Peng
Yu Cheng
Tianlong Chen
MoE
91
4
0
03 Jul 2024
The Art of Saying No: Contextual Noncompliance in Language Models
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
142
32
0
02 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
177
26
0
27 Jun 2024
Accuracy on the wrong line: On the pitfalls of noisy data for
  out-of-distribution generalisation
Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation
Amartya Sanyal
Yaxi Hu
Yaodong Yu
Yian Ma
Yixin Wang
Bernhard Schölkopf
OODD
87
2
0
27 Jun 2024
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical
  and Chemistry
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry
Linqing Chen
Weilei Wang
Zilong Bai
Peng Xu
Yan Fang
...
Lisha Zhang
Fu Bian
Zhongkai Ye
Lidong Pei
Changyang Tu
AI4MHLM&MA
107
3
0
26 Jun 2024
Banishing LLM Hallucinations Requires Rethinking Generalization
Banishing LLM Hallucinations Requires Rethinking Generalization
Johnny Li
Saksham Consul
Eda Zhou
James Wong
Naila Farooqui
...
Zhuxiaona Wei
Tian Wu
Ben Echols
Sharon Zhou
Gregory Diamos
LRM
65
13
0
25 Jun 2024
MoE-CT: A Novel Approach For Large Language Models Training With
  Resistance To Catastrophic Forgetting
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting
Tianhao Li
Shangjie Li
Binbin Xie
Deyi Xiong
Baosong Yang
CLL
122
4
0
25 Jun 2024
Towards Exact Computation of Inductive Bias
Towards Exact Computation of Inductive Bias
Akhilan Boopathy
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
96
0
0
22 Jun 2024
Towards an Improved Understanding and Utilization of Maximum Manifold
  Capacity Representations
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
Rylan Schaeffer
Victor Lecomte
Dhruv Pai
Andres Carranza
Berivan Isik
...
Yann LeCun
SueYeon Chung
Andrey Gromov
Ravid Shwartz-Ziv
Sanmi Koyejo
103
8
0
13 Jun 2024
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Licong Lin
Jingfeng Wu
Sham Kakade
Peter L. Bartlett
Jason D. Lee
LRM
145
20
0
12 Jun 2024
Are Protein Language Models Compute Optimal?
Are Protein Language Models Compute Optimal?
Yaiza Serrano
Álvaro Ciudad
Alexis Molina
65
7
0
11 Jun 2024
Zyda: A 1.3T Dataset for Open Language Modeling
Zyda: A 1.3T Dataset for Open Language Modeling
Yury Tokpanov
Beren Millidge
Paolo Glorioso
Jonathan Pilault
Adam Ibrahim
James Whittington
Quentin Anthony
95
2
0
04 Jun 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large
  Language Models
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Haoran Que
Jiaheng Liu
Ge Zhang
Chenchen Zhang
Xingwei Qu
...
Jie Fu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
CLL
150
17
0
03 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian
  Alignment
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
114
0
0
31 May 2024
Previous
12345678
Next