ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.08361
  4. Cited By
Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
ArXivPDFHTML

Papers citing "Scaling Laws for Neural Language Models"

50 / 938 papers shown
Title
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Galvatron: An Automatic Distributed System for Efficient Foundation Model Training
Xinyi Liu
Y. Wang
Shenhan Zhu
Fangcheng Fu
Qingshuo Liu
Guangming Lin
Bin Cui
GNN
134
0
0
30 Apr 2025
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Model Connectomes: A Generational Approach to Data-Efficient Language Models
Klemen Kotar
Greta Tuckute
49
0
0
29 Apr 2025
Jekyll-and-Hyde Tipping Point in an AI's Behavior
Jekyll-and-Hyde Tipping Point in an AI's Behavior
Neil F. Johnson
Frank Yingjie Huo
46
0
0
29 Apr 2025
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
TF1-EN-3M: Three Million Synthetic Moral Fables for Training Small, Open Language Models
Mihai Nadas
Laura Diosan
Andrei Piscoran
Andreea Tomescu
VGen
57
0
0
29 Apr 2025
ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models
ReCIT: Reconstructing Full Private Data from Gradient in Parameter-Efficient Fine-Tuning of Large Language Models
Jin Xie
Ruishi He
Songze Li
Xiaojun Jia
Shouling Ji
SILM
AAML
66
0
0
29 Apr 2025
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
BRIDGE: Benchmarking Large Language Models for Understanding Real-world Clinical Practice Text
Jiageng Wu
Bowen Gu
Ren Zhou
Kevin Xie
Doug Snyder
...
S.
Jonathan H. Chen
Santiago Romero-Brufau
K. J. Lin
Jie Yang
LM&MA
ELM
92
0
0
28 Apr 2025
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
TurboQuant: Online Vector Quantization with Near-optimal Distortion Rate
A. Zandieh
Majid Daliri
Majid Hadian
Vahab Mirrokni
MQ
74
0
0
28 Apr 2025
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Can a Crow Hatch a Falcon? Lineage Matters in Predicting Large Language Model Performance
Takuya Tamura
Taro Yano
Masafumi Enomoto
M. Oyamada
39
0
0
28 Apr 2025
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Accelerating Mixture-of-Experts Training with Adaptive Expert Replication
Athinagoras Skiadopoulos
Mark Zhao
Swapnil Gandhi
Thomas Norrie
Shrijeet Mukherjee
Christos Kozyrakis
MoE
91
0
0
28 Apr 2025
Platonic Grounding for Efficient Multimodal Language Models
Platonic Grounding for Efficient Multimodal Language Models
Moulik Choraria
Xinbo Wu
Akhil Bhimaraju
Nitesh Sekhar
Yue Wu
Xu Zhang
Prateek Singhal
L. Varshney
54
0
0
27 Apr 2025
VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction
VeriDebug: A Unified LLM for Verilog Debugging via Contrastive Embedding and Guided Correction
N. Wang
Bingkun Yao
Jie Zhou
Yuchen Hu
Xi Wang
Nan Guan
Zhe Jiang
36
0
0
27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
S.
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
51
0
0
27 Apr 2025
Anyprefer: An Agentic Framework for Preference Data Synthesis
Anyprefer: An Agentic Framework for Preference Data Synthesis
Yiyang Zhou
Z. Wang
Tianle Wang
Shangyu Xing
Peng Xia
...
Chetan Bansal
Weitong Zhang
Ying Wei
Mohit Bansal
Huaxiu Yao
61
0
0
27 Apr 2025
Bi-directional Model Cascading with Proxy Confidence
Bi-directional Model Cascading with Proxy Confidence
David Warren
Mark Dras
44
0
0
27 Apr 2025
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
PyViT-FUSE: A Foundation Model for Multi-Sensor Earth Observation Data
Manuel Weber
Carly Beneke
ViT
63
0
0
26 Apr 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
63
0
0
26 Apr 2025
Scaling Laws For Scalable Oversight
Scaling Laws For Scalable Oversight
Joshua Engels
David D. Baek
Subhash Kantamneni
Max Tegmark
ELM
75
0
0
25 Apr 2025
NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation
NoEsis: Differentially Private Knowledge Transfer in Modular LLM Adaptation
Rob Romijnders
Stefanos Laskaridis
Ali Shahin Shamsabadi
Hamed Haddadi
64
0
0
25 Apr 2025
Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation
Appa: Bending Weather Dynamics with Latent Diffusion Models for Global Data Assimilation
Gérome Andry
François Rozet
Sacha Lewin
Omer Rochman
Victor Mangeleer
Matthias Pirlet
Elise Faulx
Marilaure Grégoire
Gilles Louppe
AI4Cl
66
1
0
25 Apr 2025
DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering
DualRAG: A Dual-Process Approach to Integrate Reasoning and Retrieval for Multi-Hop Question Answering
Rong Cheng
J. Liu
Yan Zheng
Fei Ni
Jiazhen Du
Hangyu Mao
Fuzheng Zhang
Bo-Lan Wang
Jianye Hao
LRM
62
0
0
25 Apr 2025
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Evaluating Evaluation Metrics -- The Mirage of Hallucination Detection
Atharva Kulkarni
Yuan-kang Zhang
Joel Ruben Antony Moniz
Xiou Ge
Bo-Hsiang Tseng
Dhivya Piraviperumal
S.
Hong-ye Yu
HILM
83
0
0
25 Apr 2025
A multilevel approach to accelerate the training of Transformers
A multilevel approach to accelerate the training of Transformers
Guillaume Lauga
Maël Chaumette
Edgar Desainte-Maréville
Étienne Lasalle
Arthur Lebeurrier
AI4CE
40
0
0
24 Apr 2025
MAGIC: Near-Optimal Data Attribution for Deep Learning
MAGIC: Near-Optimal Data Attribution for Deep Learning
Andrew Ilyas
Logan Engstrom
TDI
39
0
0
23 Apr 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
78
1
0
23 Apr 2025
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
QuaDMix: Quality-Diversity Balanced Data Selection for Efficient LLM Pretraining
Fengze Liu
Weidong Zhou
Binbin Liu
Zhimiao Yu
Yifan Zhang
...
Yifeng Yu
Bingni Zhang
Xiaohuan Zhou
Taifeng Wang
Yong Cao
61
1
0
23 Apr 2025
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Can Large Language Models Help Multimodal Language Analysis? MMLA: A Comprehensive Benchmark
Hanlei Zhang
Zhuohang Li
Yeshuang Zhu
Hua Xu
Peiwu Wang
Haige Zhu
Jie Zhou
Jinchao Zhang
39
0
0
23 Apr 2025
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
UrbanPlanBench: A Comprehensive Urban Planning Benchmark for Evaluating Large Language Models
Yu Zheng
Longyi Liu
Yuming Lin
Jie Feng
Guozhen Zhang
Depeng Jin
Yong Li
ELM
73
0
0
23 Apr 2025
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
From Reflection to Perfection: Scaling Inference-Time Optimization for Text-to-Image Diffusion Models via Reflection Tuning
Le Zhuo
Liangbing Zhao
Sayak Paul
Yue Liao
Renrui Zhang
Yi Xin
Peng Gao
Mohamed Elhoseiny
H. Li
VLM
75
0
0
22 Apr 2025
AlphaGrad: Non-Linear Gradient Normalization Optimizer
AlphaGrad: Non-Linear Gradient Normalization Optimizer
Soham Sane
ODL
56
0
0
22 Apr 2025
TTRL: Test-Time Reinforcement Learning
TTRL: Test-Time Reinforcement Learning
Yuxin Zuo
Kaiyan Zhang
Shang Qu
Li Sheng
Xuekai Zhu
Biqing Qi
Youbang Sun
Ganqu Cui
Ning Ding
Bowen Zhou
OffRL
135
1
0
22 Apr 2025
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Boosting Generative Image Modeling via Joint Image-Feature Synthesis
Theodoros Kouzelis
Efstathios Karypidis
Ioannis Kakogeorgiou
Spyros Gidaris
N. Komodakis
DiffM
38
0
0
22 Apr 2025
Efficient Pretraining Length Scaling
Efficient Pretraining Length Scaling
Bohong Wu
Shen Yan
Sijun Zhang
Jianqiao Lu
Yutao Zeng
Ya Wang
Xun Zhou
127
0
0
21 Apr 2025
Trillion 7B Technical Report
Trillion 7B Technical Report
Sungjun Han
Juyoung Suk
Suyeong An
Hyungguk Kim
Kyuseok Kim
Wonsuk Yang
Seungtaek Choi
Jamin Shin
113
0
0
21 Apr 2025
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Evaluating Judges as Evaluators: The JETTS Benchmark of LLM-as-Judges as Test-Time Scaling Evaluators
Yilun Zhou
Austin Xu
Peifeng Wang
Caiming Xiong
Shafiq R. Joty
ELM
ALM
LRM
50
2
0
21 Apr 2025
Efficient Federated Split Learning for Large Language Models over Communication Networks
Efficient Federated Split Learning for Large Language Models over Communication Networks
Kai Zhao
Zhaohui Yang
35
0
0
20 Apr 2025
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Dense Backpropagation Improves Training for Sparse Mixture-of-Experts
Ashwinee Panda
Vatsal Baherwani
Zain Sarwar
Benjamin Thérien
Supriyo Chakraborty
Tom Goldstein
MoE
39
0
0
16 Apr 2025
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
REAL: Benchmarking Autonomous Agents on Deterministic Simulations of Real Websites
Divyansh Garg
Shaun VanWeelden
Diego Caples
Andis Draguns
Nikil Ravi
...
Youngchul Joo
Jindong Gu
Charles London
Christian Schroeder de Witt
S. Motwani
39
1
0
15 Apr 2025
CSPLADE: Learned Sparse Retrieval with Causal Language Models
CSPLADE: Learned Sparse Retrieval with Causal Language Models
Zhichao Xu
Aosong Feng
Yijun Tian
Haibo Ding
Lin Leee Cheong
RALM
40
0
0
15 Apr 2025
Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization
Frozen Layers: Memory-efficient Many-fidelity Hyperparameter Optimization
Timur Carstensen
Neeratyoy Mallik
Frank Hutter
Martin Rapp
AI4CE
30
0
0
14 Apr 2025
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Evaluation Under Imperfect Benchmarks and Ratings: A Case Study in Text Simplification
Joseph Liu
Yoonsoo Nam
Xinyue Cui
Swabha Swayamdipta
53
0
0
13 Apr 2025
Towards Combinatorial Interpretability of Neural Computation
Towards Combinatorial Interpretability of Neural Computation
Micah Adler
Dan Alistarh
Nir Shavit
FAtt
110
1
0
10 Apr 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
38
108
0
10 Apr 2025
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Missing Premise exacerbates Overthinking: Are Reasoning Models losing Critical Thinking Skill?
Chenrui Fan
Ming Li
Lichao Sun
Tianyi Zhou
LRM
51
3
0
09 Apr 2025
Pretraining Language Models for Diachronic Linguistic Change Discovery
Pretraining Language Models for Diachronic Linguistic Change Discovery
Elisabeth Fittschen
Sabrina Li
Tom Lippincott
Leshem Choshen
Craig Messner
26
0
0
07 Apr 2025
Universal Item Tokenization for Transferable Generative Recommendation
Universal Item Tokenization for Transferable Generative Recommendation
Bowen Zheng
Hongyu Lu
Yu Chen
Wayne Xin Zhao
Ji-Rong Wen
31
0
0
06 Apr 2025
Pre-training Generative Recommender with Multi-Identifier Item Tokenization
Pre-training Generative Recommender with Multi-Identifier Item Tokenization
Bowen Zheng
Enze Liu
Z. Chen
Zhongrui Ma
Yue Wang
Wayne Xin Zhao
Ji-Rong Wen
35
0
0
06 Apr 2025
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
STEP: Staged Parameter-Efficient Pre-training for Large Language Models
Kazuki Yano
Takumi Ito
Jun Suzuki
LRM
47
1
0
05 Apr 2025
Large (Vision) Language Models are Unsupervised In-Context Learners
Large (Vision) Language Models are Unsupervised In-Context Learners
Artyom Gadetsky
Andrei Atanov
Yulun Jiang
Zhitong Gao
Ghazal Hosseini Mighan
Amir Zamir
Maria Brbić
VLM
MLLM
LRM
69
0
0
03 Apr 2025
Hardware-Enabled Mechanisms for Verifying Responsible AI Development
Hardware-Enabled Mechanisms for Verifying Responsible AI Development
Aidan O'Gara
Gabriel Kulp
Will Hodgkins
James Petrie
Vincent Immler
Aydin Aysu
K. Basu
S. Bhasin
S. Picek
Ankur Srivastava
19
0
0
02 Apr 2025
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Advancing MoE Efficiency: A Collaboration-Constrained Routing (C2R) Strategy for Better Expert Parallelism Design
Mohan Zhang
Pingzhi Li
Jie Peng
Mufan Qiu
Tianlong Chen
MoE
45
0
0
02 Apr 2025
Previous
12345...171819
Next