ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.08361
  4. Cited By
Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
ArXivPDFHTML

Papers citing "Scaling Laws for Neural Language Models"

50 / 990 papers shown
Title
LeTI: Learning to Generate from Textual Interactions
LeTI: Learning to Generate from Textual Interactions
Xingyao Wang
Hao Peng
Reyhaneh Jabbarvand
Heng Ji
35
30
0
17 May 2023
OSDP: Optimal Sharded Data Parallel for Distributed Deep Learning
Youhe Jiang
Fangcheng Fu
Xupeng Miao
Xiaonan Nie
Tengjiao Wang
36
11
0
17 May 2023
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Revisiting the Minimalist Approach to Offline Reinforcement Learning
Denis Tarasov
Vladislav Kurenkov
Alexander Nikulin
Sergey Kolesnikov
OffRL
33
36
0
16 May 2023
MoMo: Momentum Models for Adaptive Learning Rates
MoMo: Momentum Models for Adaptive Learning Rates
Fabian Schaipp
Ruben Ohana
Michael Eickenberg
Aaron Defazio
Robert Mansel Gower
35
10
0
12 May 2023
How Good are Commercial Large Language Models on African Languages?
How Good are Commercial Large Language Models on African Languages?
Jessica Ojo
Kelechi Ogueji
26
5
0
11 May 2023
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits
  Siamese-BLOOM
LACoS-BLOOM: Low-rank Adaptation with Contrastive objective on 8 bits Siamese-BLOOM
Wenhui Hua
Brian Williams
Davood Shamsi
28
3
0
10 May 2023
When and What to Ask Through World States and Text Instructions: IGLU
  NLP Challenge Solution
When and What to Ask Through World States and Text Instructions: IGLU NLP Challenge Solution
Zhengxiang Shi
Jerome Ramos
To Eun Kim
Xi Wang
Hossein A. Rahmani
Aldo Lipani
27
10
0
09 May 2023
What is the best recipe for character-level encoder-only modelling?
What is the best recipe for character-level encoder-only modelling?
Kris Cao
32
2
0
09 May 2023
The Vault: A Comprehensive Multilingual Dataset for Advancing Code
  Understanding and Generation
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Dũng Nguyễn Mạnh
Nam Le Hai
An Dau
A. Nguyen
Khanh N. Nghiem
Jingnan Guo
Nghi D. Q. Bui
34
15
0
09 May 2023
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with
  Large Language Models
SUR-adapter: Enhancing Text-to-Image Pre-trained Diffusion Models with Large Language Models
Shan Zhong
Zhongzhan Huang
Wushao Wen
Jinghui Qin
Liang Lin
24
40
0
09 May 2023
Augmented Large Language Models with Parametric Knowledge Guiding
Augmented Large Language Models with Parametric Knowledge Guiding
Ziyang Luo
Can Xu
Pu Zhao
Xiubo Geng
Chongyang Tao
Jing Ma
Qingwei Lin
Daxin Jiang
KELM
RALM
37
44
0
08 May 2023
Residual Prompt Tuning: Improving Prompt Tuning with Residual
  Reparameterization
Residual Prompt Tuning: Improving Prompt Tuning with Residual Reparameterization
Anastasia Razdaibiedina
Yuning Mao
Rui Hou
Madian Khabsa
M. Lewis
Jimmy Ba
Amjad Almahairi
VLM
27
42
0
06 May 2023
Towards Being Parameter-Efficient: A Stratified Sparsely Activated
  Transformer with Dynamic Capacity
Towards Being Parameter-Efficient: A Stratified Sparsely Activated Transformer with Dynamic Capacity
Da Xu
Maha Elbayad
Kenton W. Murray
Jean Maillard
Vedanuj Goswami
MoE
47
3
0
03 May 2023
Sparsified Model Zoo Twins: Investigating Populations of Sparsified
  Neural Network Models
Sparsified Model Zoo Twins: Investigating Populations of Sparsified Neural Network Models
D. Honegger
Konstantin Schurholt
Damian Borth
31
4
0
26 Apr 2023
Emergent and Predictable Memorization in Large Language Models
Emergent and Predictable Memorization in Large Language Models
Stella Biderman
USVSN Sai Prashanth
Lintang Sutawika
Hailey Schoelkopf
Quentin G. Anthony
Shivanshu Purohit
Edward Raf
29
116
0
21 Apr 2023
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Progressive-Hint Prompting Improves Reasoning in Large Language Models
Chuanyang Zheng
Zhengying Liu
Enze Xie
Zhenguo Li
Yu Li
LLMAG
ReLM
LRM
41
103
0
19 Apr 2023
STen: Productive and Efficient Sparsity in PyTorch
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
34
4
0
15 Apr 2023
On the Opportunities and Challenges of Foundation Models for Geospatial
  Artificial Intelligence
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence
Gengchen Mai
Weiming Huang
Jin Sun
Suhang Song
Deepak Mishra
...
Yingjie Hu
Chris Cundy
Ziyuan Li
Rui Zhu
Ni Lao
AI4CE
32
123
0
13 Apr 2023
Multilingual Machine Translation with Large Language Models: Empirical
  Results and Analysis
Multilingual Machine Translation with Large Language Models: Empirical Results and Analysis
Wenhao Zhu
Hongyi Liu
Qingxiu Dong
Jingjing Xu
Shujian Huang
Lingpeng Kong
Jiajun Chen
Lei Li
LRM
31
140
0
10 Apr 2023
OpenAGI: When LLM Meets Domain Experts
OpenAGI: When LLM Meets Domain Experts
Yingqiang Ge
Wenyue Hua
Kai Mei
Jianchao Ji
Juntao Tan
Shuyuan Xu
Zelong Li
Yongfeng Zhang
VLM
LRM
38
211
0
10 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean
  Field Neural Networks
Dynamics of Finite Width Kernel and Prediction Fluctuations in Mean Field Neural Networks
Blake Bordelon
Cengiz Pehlevan
MLT
38
29
0
06 Apr 2023
On the Pareto Front of Multilingual Neural Machine Translation
On the Pareto Front of Multilingual Neural Machine Translation
Liang Chen
Shuming Ma
Dongdong Zhang
Furu Wei
Baobao Chang
MoE
23
5
0
06 Apr 2023
Inductive biases in deep learning models for weather prediction
Inductive biases in deep learning models for weather prediction
Jannik Thümmel
Matthias Karlbauer
S. Otte
C. Zarfl
Georg Martius
...
Thomas Scholten
Ulrich Friedrich
V. Wulfmeyer
B. Goswami
Martin Volker Butz
AI4CE
43
5
0
06 Apr 2023
Segment Anything
Segment Anything
A. Kirillov
Eric Mintun
Nikhila Ravi
Hanzi Mao
Chloe Rolland
...
Spencer Whitehead
Alexander C. Berg
Wan-Yen Lo
Piotr Dollár
Ross B. Girshick
MLLM
VLM
57
6,822
0
05 Apr 2023
Safety Analysis in the Era of Large Language Models: A Case Study of
  STPA using ChatGPT
Safety Analysis in the Era of Large Language Models: A Case Study of STPA using ChatGPT
Yi Qi
Xingyu Zhao
Siddartha Khastgir
Xiaowei Huang
24
14
0
03 Apr 2023
GPT-4 can pass the Korean National Licensing Examination for Korean
  Medicine Doctors
GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors
Dongyeop Jang
Tae-Rim Yun
Choong-Yeol Lee
Young-Kyu Kwon
Chang-Eop Kim
ELM
LM&MA
32
26
0
31 Mar 2023
BloombergGPT: A Large Language Model for Finance
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
76
786
0
30 Mar 2023
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All
  You Need
Active Self-Supervised Learning: A Few Low-Cost Relationships Are All You Need
Vivien A. Cabannes
Léon Bottou
Yann LeCun
Randall Balestriero
48
13
0
27 Mar 2023
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Text-to-Image Diffusion Models are Zero-Shot Classifiers
Kevin Clark
P. Jaini
DiffM
VLM
38
107
0
27 Mar 2023
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training
  Efficiency
Sparse-IFT: Sparse Iso-FLOP Transformations for Maximizing Training Efficiency
Vithursan Thangarasa
Shreyas Saxena
Abhay Gupta
Sean Lie
31
3
0
21 Mar 2023
Capabilities of GPT-4 on Medical Challenge Problems
Capabilities of GPT-4 on Medical Challenge Problems
Harsha Nori
Nicholas King
S. McKinney
Dean Carignan
Eric Horvitz
LM&MA
ELM
AI4MH
41
766
0
20 Mar 2023
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of
  Large Language Models
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Tyna Eloundou
Sam Manning
Pamela Mishkin
Daniel Rock
ELM
35
380
0
17 Mar 2023
CoLT5: Faster Long-Range Transformers with Conditional Computation
CoLT5: Faster Long-Range Transformers with Conditional Computation
Joshua Ainslie
Tao Lei
Michiel de Jong
Santiago Ontañón
Siddhartha Brahma
...
Mandy Guo
James Lee-Thorp
Yi Tay
Yun-hsuan Sung
Sumit Sanghai
LLMAG
33
63
0
17 Mar 2023
Neural Architecture Search for Effective Teacher-Student Knowledge
  Transfer in Language Models
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Rameswar Panda
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
30
7
0
16 Mar 2023
Simfluence: Modeling the Influence of Individual Training Examples by
  Simulating Training Runs
Simfluence: Modeling the Influence of Individual Training Examples by Simulating Training Runs
Kelvin Guu
Albert Webson
Ellie Pavlick
Lucas Dixon
Ian Tenney
Tolga Bolukbasi
TDI
70
33
0
14 Mar 2023
Complement Sparsification: Low-Overhead Model Pruning for Federated
  Learning
Complement Sparsification: Low-Overhead Model Pruning for Federated Learning
Xiaopeng Jiang
Cristian Borcea
FedML
34
15
0
10 Mar 2023
An Overview on Language Models: Recent Developments and Outlook
An Overview on Language Models: Recent Developments and Outlook
Chengwei Wei
Yun Cheng Wang
Bin Wang
C.-C. Jay Kuo
25
42
0
10 Mar 2023
Exploring Efficient-Tuned Learning Audio Representation Method from
  BriVL
Exploring Efficient-Tuned Learning Audio Representation Method from BriVL
Sen Fang
Yang Wu
Bowen Gao
Jingwen Cai
T. Teoh
DiffM
29
1
0
08 Mar 2023
Real-World Humanoid Locomotion with Reinforcement Learning
Real-World Humanoid Locomotion with Reinforcement Learning
Ilija Radosavovic
Tete Xiao
Bike Zhang
Trevor Darrell
Jitendra Malik
K. Sreenath
26
124
0
06 Mar 2023
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in
  Tencent
Angel-PTM: A Scalable and Economical Large-scale Pre-training System in Tencent
Xiaonan Nie
Yi Liu
Fangcheng Fu
Jinbao Xue
Dian Jiao
Xupeng Miao
Yangyu Tao
Tengjiao Wang
MoE
31
16
0
06 Mar 2023
Understanding plasticity in neural networks
Understanding plasticity in neural networks
Clare Lyle
Zeyu Zheng
Evgenii Nikishin
Bernardo Avila-Pires
Razvan Pascanu
Will Dabney
AI4CE
35
97
0
02 Mar 2023
Communication-efficient Federated Learning with Single-Step Synthetic
  Features Compressor for Faster Convergence
Communication-efficient Federated Learning with Single-Step Synthetic Features Compressor for Faster Convergence
Yuhao Zhou
Mingjia Shi
Yuanxi Li
Qing Ye
Yanan Sun
Jiancheng Lv
18
3
0
27 Feb 2023
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
The Dormant Neuron Phenomenon in Deep Reinforcement Learning
Ghada Sokar
Rishabh Agarwal
Pablo Samuel Castro
Utku Evci
CLL
51
88
0
24 Feb 2023
MUX-PLMs: Data Multiplexing for High-throughput Language Models
MUX-PLMs: Data Multiplexing for High-throughput Language Models
Vishvak Murahari
A. Deshpande
Carlos E. Jimenez
Izhak Shafran
Mingqiu Wang
Yuan Cao
Karthik Narasimhan
MoE
26
5
0
24 Feb 2023
Reward Learning as Doubly Nonparametric Bandits: Optimal Design and
  Scaling Laws
Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws
Kush S. Bhatia
Wenshuo Guo
Jacob Steinhardt
19
0
0
23 Feb 2023
Poisoning Web-Scale Training Datasets is Practical
Poisoning Web-Scale Training Datasets is Practical
Nicholas Carlini
Matthew Jagielski
Christopher A. Choquette-Choo
Daniel Paleka
Will Pearce
Hyrum S. Anderson
Andreas Terzis
Kurt Thomas
Florian Tramèr
SILM
31
182
0
20 Feb 2023
Scaling Laws for Multilingual Neural Machine Translation
Scaling Laws for Multilingual Neural Machine Translation
Patrick Fernandes
Behrooz Ghorbani
Xavier Garcia
Markus Freitag
Orhan Firat
38
29
0
19 Feb 2023
Cluster-Guided Label Generation in Extreme Multi-Label Classification
Cluster-Guided Label Generation in Extreme Multi-Label Classification
Taehee Jung
Joo-Kyung Kim
Sungjin Lee
Dongyeop Kang
VLM
24
6
0
17 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
48
194
0
16 Feb 2023
Previous
123...131415...181920
Next