ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.08361
  4. Cited By
Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
ArXivPDFHTML

Papers citing "Scaling Laws for Neural Language Models"

32 / 982 papers shown
Title
Improved Denoising Diffusion Probabilistic Models
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
60
3,526
0
18 Feb 2021
Proof Artifact Co-training for Theorem Proving with Language Models
Proof Artifact Co-training for Theorem Proving with Language Models
Jesse Michael Han
Jason M. Rute
Yuhuai Wu
Edward W. Ayers
Stanislas Polu
AIMat
25
120
0
11 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Learning Curve Theory
Learning Curve Theory
Marcus Hutter
140
58
0
08 Feb 2021
Embodied Intelligence via Learning and Evolution
Embodied Intelligence via Learning and Evolution
Agrim Gupta
Silvio Savarese
Surya Ganguli
Li Fei-Fei
AI4CE
22
230
0
03 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language
  Models
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
32
207
0
03 Feb 2021
Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research
Emergent Unfairness in Algorithmic Fairness-Accuracy Trade-Off Research
A. Feder Cooper
Ellen Abrams
FaML
25
60
0
01 Feb 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
414
0
18 Jan 2021
Switch Transformers: Scaling to Trillion Parameter Models with Simple
  and Efficient Sparsity
Switch Transformers: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity
W. Fedus
Barret Zoph
Noam M. Shazeer
MoE
11
2,075
0
11 Jan 2021
Reservoir Transformers
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
BERT Goes Shopping: Comparing Distributional Models for Product
  Representations
BERT Goes Shopping: Comparing Distributional Models for Product Representations
Federico Bianchi
Bingqing Yu
Jacopo Tagliabue
12
15
0
17 Dec 2020
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Understanding Capacity-Driven Scale-Out Neural Recommendation Inference
Michael Lui
Yavuz Yetim
Özgür Özkan
Zhuoran Zhao
Shin-Yeh Tsai
Carole-Jean Wu
Mark Hempstead
GNN
BDL
LRM
22
51
0
04 Nov 2020
Scaling Laws for Autoregressive Generative Modeling
Scaling Laws for Autoregressive Generative Modeling
T. Henighan
Jared Kaplan
Mor Katz
Mark Chen
Christopher Hesse
...
Nick Ryder
Daniel M. Ziegler
John Schulman
Dario Amodei
Sam McCandlish
27
405
0
28 Oct 2020
Are wider nets better given the same number of parameters?
Are wider nets better given the same number of parameters?
A. Golubeva
Behnam Neyshabur
Guy Gur-Ari
27
44
0
27 Oct 2020
Rethinking embedding coupling in pre-trained language models
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
95
142
0
24 Oct 2020
Deep Learning is Singular, and That's Good
Deep Learning is Singular, and That's Good
Daniel Murfet
Susan Wei
Biwei Huang
Hui Li
Jesse Gell-Redman
T. Quella
UQCV
24
26
0
22 Oct 2020
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Reformulating Unsupervised Style Transfer as Paraphrase Generation
Kalpesh Krishna
John Wieting
Mohit Iyyer
21
237
0
12 Oct 2020
Dataset Cartography: Mapping and Diagnosing Datasets with Training
  Dynamics
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics
Swabha Swayamdipta
Roy Schwartz
Nicholas Lourie
Yizhong Wang
Hannaneh Hajishirzi
Noah A. Smith
Yejin Choi
32
429
0
22 Sep 2020
Evaluating representations by the complexity of learning low-loss
  predictors
Evaluating representations by the complexity of learning low-loss predictors
William F. Whitney
M. Song
David Brandfonbrener
Jaan Altosaar
Kyunghyun Cho
23
23
0
15 Sep 2020
Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent
Statistical Query Algorithms and Low-Degree Tests Are Almost Equivalent
Matthew Brennan
Guy Bresler
Samuel B. Hopkins
J. Li
T. Schramm
19
62
0
13 Sep 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
33
131
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Z. Chen
MoE
20
1,106
0
30 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
On the Predictability of Pruning Across Scales
On the Predictability of Pruning Across Scales
Jonathan S. Rosenfeld
Jonathan Frankle
Michael Carbin
Nir Shavit
12
37
0
18 Jun 2020
What Matters In On-Policy Reinforcement Learning? A Large-Scale
  Empirical Study
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Marcin Andrychowicz
Anton Raichuk
Piotr Stańczyk
Manu Orsini
Sertan Girgin
...
M. Geist
Olivier Pietquin
Marcin Michalski
Sylvain Gelly
Olivier Bachem
OffRL
31
213
0
10 Jun 2020
Predictive Coding Approximates Backprop along Arbitrary Computation
  Graphs
Predictive Coding Approximates Backprop along Arbitrary Computation Graphs
Beren Millidge
Alexander Tschantz
Christopher L. Buckley
30
118
0
07 Jun 2020
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
17
40,023
0
28 May 2020
The Cost of Training NLP Models: A Concise Overview
The Cost of Training NLP Models: A Concise Overview
Or Sharir
Barak Peleg
Y. Shoham
31
209
0
19 Apr 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
22
48
0
10 Feb 2020
Big Transfer (BiT): General Visual Representation Learning
Big Transfer (BiT): General Visual Representation Learning
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
J. Puigcerver
Jessica Yung
Sylvain Gelly
N. Houlsby
MQ
91
1,183
0
24 Dec 2019
Speech Intention Understanding in a Head-final Language: A
  Disambiguation Utilizing Intonation-dependency
Speech Intention Understanding in a Head-final Language: A Disambiguation Utilizing Intonation-dependency
Won Ik Cho
Hyeon Seung Lee
J. Yoon
Seokhwan Kim
N. Kim
36
5
0
10 Nov 2018
Quantifying the probable approximation error of probabilistic inference
  programs
Quantifying the probable approximation error of probabilistic inference programs
Marco F. Cusumano-Towner
Vikash K. Mansinghka
30
5
0
31 May 2016
Previous
123...181920