ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2001.08361
  4. Cited By
Scaling Laws for Neural Language Models

Scaling Laws for Neural Language Models

23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
ArXivPDFHTML

Papers citing "Scaling Laws for Neural Language Models"

50 / 982 papers shown
Title
Is the Number of Trainable Parameters All That Actually Matters?
Is the Number of Trainable Parameters All That Actually Matters?
A. Chatelain
Amine Djeghri
Daniel Hesslow
Julien Launay
Iacopo Poli
51
7
0
24 Sep 2021
Scale Efficiently: Insights from Pre-training and Fine-tuning
  Transformers
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers
Yi Tay
Mostafa Dehghani
J. Rao
W. Fedus
Samira Abnar
Hyung Won Chung
Sharan Narang
Dani Yogatama
Ashish Vaswani
Donald Metzler
206
110
0
22 Sep 2021
Primer: Searching for Efficient Transformers for Language Modeling
Primer: Searching for Efficient Transformers for Language Modeling
David R. So
Wojciech Mañke
Hanxiao Liu
Zihang Dai
Noam M. Shazeer
Quoc V. Le
VLM
88
152
0
17 Sep 2021
Image Captioning for Effective Use of Language Models in Knowledge-Based
  Visual Question Answering
Image Captioning for Effective Use of Language Models in Knowledge-Based Visual Question Answering
Ander Salaberria
Gorka Azkune
Oier López de Lacalle
Aitor Soroa Etxabe
Eneko Agirre
30
59
0
15 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández-Orallo
35
113
0
12 Sep 2021
What Changes Can Large-scale Language Models Bring? Intensive Study on
  HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
What Changes Can Large-scale Language Models Bring? Intensive Study on HyperCLOVA: Billions-scale Korean Generative Pretrained Transformers
Boseop Kim
Hyoungseok Kim
Sang-Woo Lee
Gichang Lee
Donghyun Kwak
...
Jaewook Kang
Inho Kang
Jung-Woo Ha
W. Park
Nako Sung
VLM
249
121
0
10 Sep 2021
Robust fine-tuning of zero-shot models
Robust fine-tuning of zero-shot models
Mitchell Wortsman
Gabriel Ilharco
Jong Wook Kim
Mike Li
Simon Kornblith
...
Raphael Gontijo-Lopes
Hannaneh Hajishirzi
Ali Farhadi
Hongseok Namkoong
Ludwig Schmidt
VLM
61
689
0
04 Sep 2021
Why and How Governments Should Monitor AI Development
Why and How Governments Should Monitor AI Development
Jess Whittlestone
Jack Clark
19
29
0
28 Aug 2021
Design and Scaffolded Training of an Efficient DNN Operator for Computer
  Vision on the Edge
Design and Scaffolded Training of an Efficient DNN Operator for Computer Vision on the Edge
Vinod Ganesan
Pratyush Kumar
39
2
0
25 Aug 2021
A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your
  Pre-training Effective?
A Scaling Law for Synthetic-to-Real Transfer: How Much Is Your Pre-training Effective?
Hiroaki Mikami
Kenji Fukumizu
Shogo Murai
Shuji Suzuki
Yuta Kikuchi
Taiji Suzuki
S. Maeda
Kohei Hayashi
40
12
0
25 Aug 2021
Curriculum learning for language modeling
Curriculum learning for language modeling
Daniel Fernando Campos
8
32
0
04 Aug 2021
Simple, Fast, and Flexible Framework for Matrix Completion with Infinite
  Width Neural Networks
Simple, Fast, and Flexible Framework for Matrix Completion with Infinite Width Neural Networks
Adityanarayanan Radhakrishnan
George Stefanakis
M. Belkin
Caroline Uhler
30
25
0
31 Jul 2021
Dataset Distillation with Infinitely Wide Convolutional Networks
Dataset Distillation with Infinitely Wide Convolutional Networks
Timothy Nguyen
Roman Novak
Lechao Xiao
Jaehoon Lee
DD
46
229
0
27 Jul 2021
Codified audio language modeling learns useful representations for music
  information retrieval
Codified audio language modeling learns useful representations for music information retrieval
Rodrigo Castellon
Chris Donahue
Percy Liang
78
86
0
12 Jul 2021
Transflower: probabilistic autoregressive dance generation with
  multimodal attention
Transflower: probabilistic autoregressive dance generation with multimodal attention
Guillermo Valle Pérez
G. Henter
Jonas Beskow
A. Holzapfel
Pierre-Yves Oudeyer
Simon Alexanderson
30
42
0
25 Jun 2021
NodePiece: Compositional and Parameter-Efficient Representations of
  Large Knowledge Graphs
NodePiece: Compositional and Parameter-Efficient Representations of Large Knowledge Graphs
Mikhail Galkin
E. Denis
Jiapeng Wu
William L. Hamilton
OCL
25
86
0
23 Jun 2021
Revisiting Model Stitching to Compare Neural Representations
Revisiting Model Stitching to Compare Neural Representations
Yamini Bansal
Preetum Nakkiran
Boaz Barak
FedML
44
105
0
14 Jun 2021
Pre-Trained Models: Past, Present and Future
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
40
815
0
14 Jun 2021
Disentangling the Roles of Curation, Data-Augmentation and the Prior in
  the Cold Posterior Effect
Disentangling the Roles of Curation, Data-Augmentation and the Prior in the Cold Posterior Effect
Lorenzo Noci
Kevin Roth
Gregor Bachmann
Sebastian Nowozin
Thomas Hofmann
CML
30
23
0
11 Jun 2021
Scaling Laws for Acoustic Models
Scaling Laws for Acoustic Models
J. Droppo
Oguz H. Elibol
13
22
0
11 Jun 2021
Neural Symbolic Regression that Scales
Neural Symbolic Regression that Scales
Luca Biggio
Tommaso Bendinelli
Alexander Neitz
Aurelien Lucchi
Giambattista Parascandolo
40
170
0
11 Jun 2021
GraphiT: Encoding Graph Structure in Transformers
GraphiT: Encoding Graph Structure in Transformers
Grégoire Mialon
Dexiong Chen
Margot Selosse
Julien Mairal
22
164
0
10 Jun 2021
Scaling Vision Transformers
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
52
1,060
0
08 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and
  efficient training of large language models
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
13
8
0
04 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained
  Transformer Compression
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua-Hong Wu
Haifeng Wang
26
13
0
04 Jun 2021
What Matters for Adversarial Imitation Learning?
What Matters for Adversarial Imitation Learning?
Manu Orsini
Anton Raichuk
Léonard Hussenot
Damien Vincent
Robert Dadashi
Sertan Girgin
M. Geist
Olivier Bachem
Olivier Pietquin
Marcin Andrychowicz
42
77
0
01 Jun 2021
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D
  World
PIGLeT: Language Grounding Through Neuro-Symbolic Interaction in a 3D World
Rowan Zellers
Ari Holtzman
Matthew E. Peters
Roozbeh Mottaghi
Aniruddha Kembhavi
Ali Farhadi
Yejin Choi
19
68
0
01 Jun 2021
One4all User Representation for Recommender Systems in E-commerce
One4all User Representation for Recommender Systems in E-commerce
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Minkyu Kim
Young-Jin Park
Jisu Jeong
Seungjae Jung
28
27
0
24 May 2021
Pay Attention to MLPs
Pay Attention to MLPs
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
57
651
0
17 May 2021
Which transformer architecture fits my data? A vocabulary bottleneck in
  self-attention
Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Noam Wies
Yoav Levine
Daniel Jannai
Amnon Shashua
40
20
0
09 May 2021
HerBERT: Efficiently Pretrained Transformer-based Language Model for
  Polish
HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish
Robert Mroczkowski
Piotr Rybak
Alina Wróblewska
Ireneusz Gawlik
28
81
0
04 May 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
28
76
0
30 Apr 2021
Language Models are Few-Shot Butlers
Language Models are Few-Shot Butlers
Vincent Micheli
Franccois Fleuret
9
31
0
16 Apr 2021
Generating Bug-Fixes Using Pretrained Transformers
Generating Bug-Fixes Using Pretrained Transformers
Dawn Drain
Chen Henry Wu
Alexey Svyatkovskiy
Neel Sundaresan
15
50
0
16 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
35
367
0
16 Apr 2021
How to Train BERT with an Academic Budget
How to Train BERT with an Academic Budget
Peter Izsak
Moshe Berchansky
Omer Levy
12
113
0
15 Apr 2021
FastMoE: A Fast Mixture-of-Expert Training System
FastMoE: A Fast Mixture-of-Expert Training System
Jiaao He
J. Qiu
Aohan Zeng
Zhilin Yang
Jidong Zhai
Jie Tang
ALM
MoE
22
94
0
24 Mar 2021
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New
  Multitask Benchmark
UNICORN on RAINBOW: A Universal Commonsense Reasoning Model on a New Multitask Benchmark
Nicholas Lourie
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
LRM
22
137
0
24 Mar 2021
How to decay your learning rate
How to decay your learning rate
Aitor Lewkowycz
36
24
0
23 Mar 2021
The Shape of Learning Curves: a Review
The Shape of Learning Curves: a Review
T. Viering
Marco Loog
18
122
0
19 Mar 2021
Is it enough to optimize CNN architectures on ImageNet?
Is it enough to optimize CNN architectures on ImageNet?
Lukas Tuggener
Jürgen Schmidhuber
Thilo Stadelmann
25
23
0
16 Mar 2021
Revisiting ResNets: Improved Training and Scaling Strategies
Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello
W. Fedus
Xianzhi Du
E. D. Cubuk
A. Srinivas
Nayeon Lee
Jonathon Shlens
Barret Zoph
29
297
0
13 Mar 2021
Multiple Instance Captioning: Learning Representations from
  Histopathology Textbooks and Articles
Multiple Instance Captioning: Learning Representations from Histopathology Textbooks and Articles
Jevgenij Gamper
Nasir M. Rajpoot
24
62
0
08 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
84
1,840
0
05 Mar 2021
Generating Images with Sparse Representations
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
33
199
0
05 Mar 2021
Training a First-Order Theorem Prover from Synthetic Data
Training a First-Order Theorem Prover from Synthetic Data
Vlad Firoiu
Eser Aygun
Ankit Anand
Zafarali Ahmed
Xavier Glorot
Laurent Orseau
Lei Zhang
Doina Precup
Shibl Mourad
NAI
21
13
0
05 Mar 2021
Moshpit SGD: Communication-Efficient Decentralized Training on
  Heterogeneous Unreliable Devices
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
35
32
0
04 Mar 2021
Perceiver: General Perception with Iterative Attention
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
77
973
0
04 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
120
27,772
0
26 Feb 2021
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Victor Campos
Pablo Sprechmann
S. Hansen
André Barreto
Steven Kapturowski
Alex Vitvitskyi
Adria Puigdomenech Badia
Charles Blundell
OffRL
OnRL
33
25
0
24 Feb 2021
Previous
123...181920
Next