ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.00409
  4. Cited By
Deep Learning Scaling is Predictable, Empirically

Deep Learning Scaling is Predictable, Empirically

1 December 2017
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
ArXivPDFHTML

Papers citing "Deep Learning Scaling is Predictable, Empirically"

50 / 386 papers shown
Title
Neural Scaling Laws of Deep ReLU and Deep Operator Network: A
  Theoretical Study
Neural Scaling Laws of Deep ReLU and Deep Operator Network: A Theoretical Study
Hao Liu
Zecheng Zhang
Wenjing Liao
Hayden Schaeffer
28
1
0
01 Oct 2024
Learning non-Gaussian spatial distributions via Bayesian transport maps with parametric shrinkage
Learning non-Gaussian spatial distributions via Bayesian transport maps with parametric shrinkage
Anirban Chakraborty
Matthias Katzfuss
OT
34
1
0
28 Sep 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
57
12
0
26 Sep 2024
Efficient Feature Interactions with Transformers: Improving User
  Spending Propensity Predictions in Gaming
Efficient Feature Interactions with Transformers: Improving User Spending Propensity Predictions in Gaming
Ved Prakash
Kartavya Kothari
AI4TS
24
0
0
25 Sep 2024
Anisotropic Diffusion Probabilistic Model for Imbalanced Image
  Classification
Anisotropic Diffusion Probabilistic Model for Imbalanced Image Classification
Jingyu Kong
Yuan Guo
Yu Wang
Yuping Duan
DiffM
MedIm
36
0
0
22 Sep 2024
Multilevel Interpretability Of Artificial Neural Networks: Leveraging
  Framework And Methods From Neuroscience
Multilevel Interpretability Of Artificial Neural Networks: Leveraging Framework And Methods From Neuroscience
Zhonghao He
Jascha Achterberg
Katie Collins
Kevin K. Nejad
Danyal Akarca
...
Chole Li
Kai J. Sandbrink
Stephen Casper
Anna Ivanova
Grace W. Lindsay
AI4CE
28
1
0
22 Aug 2024
Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?
Do Neural Scaling Laws Exist on Graph Self-Supervised Learning?
Qian Ma
Haitao Mao
Jingzhe Liu
Zhehua Zhang
Chunlin Feng
Yu Song
Yihan Shao
Yao Ma
40
3
0
20 Aug 2024
Scaling Law with Learning Rate Annealing
Scaling Law with Learning Rate Annealing
Howe Tissue
Venus Wang
Lu Wang
26
7
0
20 Aug 2024
ScalingFilter: Assessing Data Quality through Inverse Utilization of
  Scaling Laws
ScalingFilter: Assessing Data Quality through Inverse Utilization of Scaling Laws
Ruihang Li
Yixuan Wei
Miaosen Zhang
Nenghai Yu
Han Hu
Houwen Peng
50
2
0
15 Aug 2024
Towards flexible perception with visual memory
Towards flexible perception with visual memory
Robert Geirhos
P. Jaini
Austin Stone
Sourabh Medapati
Xi Yi
G. Toderici
Abhijit Ogale
Jonathon Shlens
42
1
0
15 Aug 2024
A Survey on Model MoErging: Recycling and Routing Among Specialized
  Experts for Collaborative Learning
A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning
Prateek Yadav
Colin Raffel
Mohammed Muqeeth
Lucas Caccia
Haokun Liu
Tianlong Chen
Joey Tianyi Zhou
Leshem Choshen
Alessandro Sordoni
MoMe
49
21
0
13 Aug 2024
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in
  Vision-Language AI
Dataset Scale and Societal Consistency Mediate Facial Impression Bias in Vision-Language AI
Robert Wolfe
Aayushi Dangol
Alexis Hiniker
Bill Howe
36
2
0
04 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
40
3
0
01 Aug 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
31
22
0
31 Jul 2024
Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge
Machine learning surrogates for efficient hydrologic modeling: Insights from stochastic simulations of managed aquifer recharge
Timothy Dai
Kate Maher
Z. Perzan
37
1
0
30 Jul 2024
Analyzing and reducing the synthetic-to-real transfer gap in Music
  Information Retrieval: the task of automatic drum transcription
Analyzing and reducing the synthetic-to-real transfer gap in Music Information Retrieval: the task of automatic drum transcription
Mickaël Zehren
Marco Alunno
Paolo Bientinesi
46
0
0
29 Jul 2024
Seamless Website Fingerprinting in Multiple Environments
Seamless Website Fingerprinting in Multiple Environments
Chuxu Song
Zining Fan
Hao Wang
Richard Martin
35
1
0
28 Jul 2024
Understanding the Interplay of Scale, Data, and Bias in Language Models:
  A Case Study with BERT
Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT
Muhammad Ali
Swetasudha Panda
Qinlan Shen
Michael Wick
Ari Kobren
MILM
42
3
0
25 Jul 2024
Scaling Training Data with Lossy Image Compression
Scaling Training Data with Lossy Image Compression
Katherine L. Mentzer
Andrea Montanari
36
0
0
25 Jul 2024
Investigating learning-independent abstract reasoning in artificial
  neural networks
Investigating learning-independent abstract reasoning in artificial neural networks
T. Barak
Y. Loewenstein
28
1
0
25 Jul 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
59
11
0
25 Jul 2024
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual
  Pre-training of Language Models
CMR Scaling Law: Predicting Critical Mixture Ratios for Continual Pre-training of Language Models
Jiawei Gu
Zacc Yang
Chuanghao Ding
Rui Zhao
Fei Tan
CLL
47
4
0
24 Jul 2024
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Bias of Stochastic Gradient Descent or the Architecture: Disentangling the Effects of Overparameterization of Neural Networks
Amit Peleg
Matthias Hein
39
0
0
04 Jul 2024
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
DLO: Dynamic Layer Operation for Efficient Vertical Scaling of LLMs
Zhen Tan
Daize Dong
Xinyu Zhao
Jie Peng
Yu Cheng
Tianlong Chen
MoE
42
4
0
03 Jul 2024
The Art of Saying No: Contextual Noncompliance in Language Models
The Art of Saying No: Contextual Noncompliance in Language Models
Faeze Brahman
Sachin Kumar
Vidhisha Balachandran
Pradeep Dasigi
Valentina Pyatkin
...
Jack Hessel
Yulia Tsvetkov
Noah A. Smith
Yejin Choi
Hannaneh Hajishirzi
75
21
0
02 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
60
21
0
27 Jun 2024
Accuracy on the wrong line: On the pitfalls of noisy data for
  out-of-distribution generalisation
Accuracy on the wrong line: On the pitfalls of noisy data for out-of-distribution generalisation
Amartya Sanyal
Yaxi Hu
Yaodong Yu
Yian Ma
Yixin Wang
Bernhard Schölkopf
OODD
48
1
0
27 Jun 2024
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical
  and Chemistry
PharmaGPT: Domain-Specific Large Language Models for Bio-Pharmaceutical and Chemistry
Linqing Chen
Weilei Wang
Zilong Bai
Peng Xu
Yan Fang
...
Lisha Zhang
Fu Bian
Zhongkai Ye
Lidong Pei
Changyang Tu
AI4MH
LM&MA
53
2
0
26 Jun 2024
Banishing LLM Hallucinations Requires Rethinking Generalization
Banishing LLM Hallucinations Requires Rethinking Generalization
Johnny Li
Saksham Consul
Eda Zhou
James Wong
Naila Farooqui
...
Zhuxiaona Wei
Tian Wu
Ben Echols
Sharon Zhou
Gregory Diamos
LRM
34
10
0
25 Jun 2024
MoE-CT: A Novel Approach For Large Language Models Training With
  Resistance To Catastrophic Forgetting
MoE-CT: A Novel Approach For Large Language Models Training With Resistance To Catastrophic Forgetting
Tianhao Li
Shangjie Li
Binbin Xie
Deyi Xiong
Baosong Yang
CLL
50
3
0
25 Jun 2024
Towards Exact Computation of Inductive Bias
Towards Exact Computation of Inductive Bias
Akhilan Boopathy
William Yue
Jaedong Hwang
Abhiram Iyer
Ila Fiete
42
0
0
22 Jun 2024
Towards an Improved Understanding and Utilization of Maximum Manifold
  Capacity Representations
Towards an Improved Understanding and Utilization of Maximum Manifold Capacity Representations
Rylan Schaeffer
Victor Lecomte
Dhruv Pai
Andres Carranza
Berivan Isik
...
Yann LeCun
SueYeon Chung
Andrey Gromov
Ravid Shwartz-Ziv
Sanmi Koyejo
49
6
0
13 Jun 2024
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Scaling Laws in Linear Regression: Compute, Parameters, and Data
Licong Lin
Jingfeng Wu
Sham Kakade
Peter L. Bartlett
Jason D. Lee
LRM
44
15
0
12 Jun 2024
Are Protein Language Models Compute Optimal?
Are Protein Language Models Compute Optimal?
Yaiza Serrano
Álvaro Ciudad
Alexis Molina
40
7
0
11 Jun 2024
Zyda: A 1.3T Dataset for Open Language Modeling
Zyda: A 1.3T Dataset for Open Language Modeling
Yury Tokpanov
Beren Millidge
Paolo Glorioso
Jonathan Pilault
Adam Ibrahim
James Whittington
Quentin Anthony
45
2
0
04 Jun 2024
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large
  Language Models
D-CPT Law: Domain-specific Continual Pre-Training Scaling Law for Large Language Models
Haoran Que
Jiaheng Liu
Ge Zhang
Chenchen Zhang
Xingwei Qu
...
Jie Fu
Wenbo Su
Jiamang Wang
Lin Qu
Bo Zheng
CLL
38
13
0
03 Jun 2024
Training on the Edge of Stability Is Caused by Layerwise Jacobian
  Alignment
Training on the Edge of Stability Is Caused by Layerwise Jacobian Alignment
Mark Lowell
Catharine A. Kastner
25
0
0
31 May 2024
Is Synthetic Data all We Need? Benchmarking the Robustness of Models
  Trained with Synthetic Images
Is Synthetic Data all We Need? Benchmarking the Robustness of Models Trained with Synthetic Images
Krishnakant Singh
Thanush Navaratnam
Jannik Holmer
Simone Schaub-Meyer
Stefan Roth
DiffM
52
18
0
30 May 2024
Scaling Laws for the Value of Individual Data Points in Machine Learning
Scaling Laws for the Value of Individual Data Points in Machine Learning
Ian Covert
Wenlong Ji
Tatsunori Hashimoto
James Zou
TDI
37
8
0
30 May 2024
Phase Transitions in the Output Distribution of Large Language Models
Phase Transitions in the Output Distribution of Large Language Models
Julian Arnold
Flemming Holtorf
Frank Schafer
Niels Lörch
47
1
0
27 May 2024
gzip Predicts Data-dependent Scaling Laws
gzip Predicts Data-dependent Scaling Laws
Rohan Pandey
32
10
0
26 May 2024
Small Language Models for Application Interactions: A Case Study
Small Language Models for Application Interactions: A Case Study
Beibin Li
Yi Zhang
Sébastien Bubeck
Jeevan Pathuri
Ishai Menache
42
4
0
23 May 2024
Unraveling overoptimism and publication bias in ML-driven science
Unraveling overoptimism and publication bias in ML-driven science
Pouria Saidi
Gautam Dasarathy
Visar Berisha
25
2
0
23 May 2024
Super Tiny Language Models
Super Tiny Language Models
Dylan Hillier
Leon Guertler
Cheston Tan
Palaash Agrawal
Ruirui Chen
Bobby Cheng
58
4
0
23 May 2024
The Platonic Representation Hypothesis
The Platonic Representation Hypothesis
Minyoung Huh
Brian Cheung
Tongzhou Wang
Phillip Isola
80
114
0
13 May 2024
Separable Power of Classical and Quantum Learning Protocols Through the
  Lens of No-Free-Lunch Theorem
Separable Power of Classical and Quantum Learning Protocols Through the Lens of No-Free-Lunch Theorem
Xinbiao Wang
Yuxuan Du
Kecheng Liu
Yong Luo
Bo Du
Dacheng Tao
35
1
0
12 May 2024
Statistical divergences in high-dimensional hypothesis testing and a
  modern technique for estimating them
Statistical divergences in high-dimensional hypothesis testing and a modern technique for estimating them
Jeremy J.H. Wilkinson
Christopher G. Lester
32
0
0
10 May 2024
KAN: Kolmogorov-Arnold Networks
KAN: Kolmogorov-Arnold Networks
Ziming Liu
Yixuan Wang
Sachin Vaidya
Fabian Ruehle
James Halverson
Marin Soljacic
Thomas Y. Hou
Max Tegmark
98
485
0
30 Apr 2024
The Simpler The Better: An Entropy-Based Importance Metric To Reduce
  Neural Networks' Depth
The Simpler The Better: An Entropy-Based Importance Metric To Reduce Neural Networks' Depth
Victor Quétu
Zhu Liao
Enzo Tartaglione
49
4
0
27 Apr 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and
  Texts
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
47
7
0
26 Apr 2024
Previous
12345678
Next