Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2001.08361
Cited By
Scaling Laws for Neural Language Models
23 January 2020
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Laws for Neural Language Models"
50 / 982 papers shown
Title
Revisiting Offline Compression: Going Beyond Factorization-based Methods for Transformer Language Models
Mohammadreza Banaei
Klaudia Bałazy
Artur Kasymov
R. Lebret
Jacek Tabor
Karl Aberer
OffRL
21
0
0
08 Feb 2023
Multipath agents for modular multitask ML systems
Andrea Gesmundo
28
1
0
06 Feb 2023
Scaling laws for single-agent reinforcement learning
Jacob Hilton
Jie Tang
John Schulman
22
20
0
31 Jan 2023
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
31
18
0
30 Jan 2023
A Closer Look at Few-shot Classification Again
Xu Luo
Hao Wu
Ji Zhang
Lianli Gao
Jing Xu
Jingkuan Song
24
48
0
28 Jan 2023
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
Policy-Value Alignment and Robustness in Search-based Multi-Agent Learning
Niko A. Grupen
M. Hanlon
Alexis Hao
Daniel D. Lee
B. Selman
24
0
0
27 Jan 2023
An Experimental Study on Pretraining Transformers from Scratch for IR
Carlos Lassance
Hervé Déjean
S. Clinchant
28
11
0
25 Jan 2023
A Domain-Agnostic Approach for Characterization of Lifelong Learning Systems
Megan M. Baker
Alexander New
Mario Aguilar-Simon
Ziad Al-Halah
Sébastien M. R. Arnold
...
Zifan Xu
A. Yanguas-Gil
Harel Yedidsion
Shangqun Yu
Gautam K. Vallabha
30
15
0
18 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&Ro
OffRL
AI4CE
LRM
38
108
0
18 Jan 2023
Prompting Large Language Model for Machine Translation: A Case Study
Biao Zhang
Barry Haddow
Alexandra Birch
LRM
21
274
0
17 Jan 2023
Dissociating language and thought in large language models
Kyle Mahowald
Anna A. Ivanova
I. Blank
Nancy Kanwisher
J. Tenenbaum
Evelina Fedorenko
ELM
ReLM
29
209
0
16 Jan 2023
Data Distillation: A Survey
Noveen Sachdeva
Julian McAuley
DD
45
73
0
11 Jan 2023
Evaluation for Change
Rishi Bommasani
ELM
40
0
0
20 Dec 2022
Beyond Triplet: Leveraging the Most Data for Multimodal Machine Translation
Yaoming Zhu
Zewei Sun
Shanbo Cheng
Yuyang Huang
Liwei Wu
Mingxuan Wang
28
10
0
20 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
364
0
19 Dec 2022
I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation
Chandra Bhagavatula
Jena D. Hwang
Doug Downey
Ronan Le Bras
Ximing Lu
Lianhui Qin
Keisuke Sakaguchi
Swabha Swayamdipta
Peter West
Yejin Choi
23
34
0
19 Dec 2022
Offline Reinforcement Learning for Visual Navigation
Dhruv Shah
Arjun Bhorkar
Hrish Leen
Ilya Kostrikov
Nicholas Rhinehart
Sergey Levine
OffRL
24
29
0
16 Dec 2022
Fixing MoE Over-Fitting on Low-Resource Languages in Multilingual Machine Translation
Maha Elbayad
Anna Y. Sun
Shruti Bhosale
MoE
51
8
0
15 Dec 2022
General-Purpose In-Context Learning by Meta-Learning Transformers
Louis Kirsch
James Harrison
Jascha Narain Sohl-Dickstein
Luke Metz
40
72
0
08 Dec 2022
Deep Incubation: Training Large Models by Divide-and-Conquering
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
18
11
0
08 Dec 2022
DeepSpeed Data Efficiency: Improving Deep Learning Model Quality and Training Efficiency via Efficient Data Sampling and Routing
Conglong Li
Z. Yao
Xiaoxia Wu
Minjia Zhang
Connor Holmes
Cheng Li
Yuxiong He
27
24
0
07 Dec 2022
Pretrained Diffusion Models for Unified Human Motion Synthesis
Jianxin Ma
Shuai Bai
Chang Zhou
DiffM
VGen
AI4CE
33
31
0
06 Dec 2022
Languages You Know Influence Those You Learn: Impact of Language Characteristics on Multi-Lingual Text-to-Text Transfer
Benjamin Muller
Deepanshu Gupta
Siddharth Patwardhan
J. Fauconnier
David Vandyke
Sachin Agarwal
41
5
0
04 Dec 2022
RAMP: A Flat Nanosecond Optical Network and MPI Operations for Distributed Deep Learning Systems
Alessandro Ottino
Joshua L. Benjamin
G. Zervas
30
7
0
28 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
36
3
0
27 Nov 2022
Galvatron: Efficient Transformer Training over Multiple GPUs Using Automatic Parallelism
Xupeng Miao
Yujie Wang
Youhe Jiang
Chunan Shi
Xiaonan Nie
Hailin Zhang
Bin Cui
GNN
MoE
39
60
0
25 Nov 2022
Powderworld: A Platform for Understanding Generalization via Rich Task Distributions
Kevin Frans
Phillip Isola
OffRL
44
9
0
23 Nov 2022
Word-Level Representation From Bytes For Language Modeling
Chul Lee
Qipeng Guo
Xipeng Qiu
15
1
0
23 Nov 2022
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
Wenhu Chen
Xueguang Ma
Xinyi Wang
William W. Cohen
ReLM
ReCod
LRM
69
735
0
22 Nov 2022
Coreference Resolution through a seq2seq Transition-Based System
Bernd Bohnet
Chris Alberti
Michael Collins
28
39
0
22 Nov 2022
Compiler Provenance Recovery for Multi-CPU Architectures Using a Centrifuge Mechanism
Yuhei Otsubo
Akira Otsuka
M. Mimura
29
3
0
22 Nov 2022
Metadata Might Make Language Models Better
K. Beelen
Daniel Alexander van Strien
AI4CE
32
0
0
18 Nov 2022
A Dataset for Hyper-Relational Extraction and a Cube-Filling Approach
Yew Ken Chia
Lidong Bing
Sharifah Mahani Aljunied
Luo Si
Soujanya Poria
32
14
0
18 Nov 2022
Efficient Transformers with Dynamic Token Pooling
Piotr Nawrot
J. Chorowski
Adrian Lañcucki
E. Ponti
20
42
0
17 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
87
675
0
14 Nov 2022
Striving for data-model efficiency: Identifying data externalities on group performance
Esther Rolf
Ben Packer
Alex Beutel
Fernando Diaz
TDI
27
2
0
11 Nov 2022
Breadth-First Pipeline Parallelism
J. Lamy-Poirier
GNN
MoE
AI4CE
28
1
0
11 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
36
657
0
10 Nov 2022
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
34
295
0
09 Nov 2022
BLOOM: A 176B-Parameter Open-Access Multilingual Language Model
BigScience Workshop
:
Teven Le Scao
Angela Fan
Christopher Akiki
...
Zhongli Xie
Zifan Ye
M. Bras
Younes Belkada
Thomas Wolf
VLM
116
2,310
0
09 Nov 2022
Harmonizing the object recognition strategies of deep neural networks with humans
Thomas Fel
Ivan Felipe
Drew Linsley
Thomas Serre
36
71
0
08 Nov 2022
COPEN: Probing Conceptual Knowledge in Pre-trained Language Models
Hao Peng
Xiaozhi Wang
Shengding Hu
Hailong Jin
Lei Hou
Juanzi Li
Zhiyuan Liu
Qun Liu
18
22
0
08 Nov 2022
Astronomia ex machina: a history, primer, and outlook on neural networks in astronomy
Michael J. Smith
James E. Geach
35
32
0
07 Nov 2022
Inverse scaling can become U-shaped
Jason W. Wei
Najoung Kim
Yi Tay
Quoc V. Le
LRM
21
60
0
03 Nov 2022
Large Language Models Are Human-Level Prompt Engineers
Yongchao Zhou
Andrei Ioan Muresanu
Ziwen Han
Keiran Paster
Silviu Pitis
Harris Chan
Jimmy Ba
ALM
LLMAG
21
829
0
03 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
29
803
0
02 Nov 2022
A Solvable Model of Neural Scaling Laws
A. Maloney
Daniel A. Roberts
J. Sully
36
51
0
30 Oct 2022
COCO-DR: Combating Distribution Shifts in Zero-Shot Dense Retrieval with Contrastive and Distributionally Robust Learning
Yue Yu
Chenyan Xiong
Si Sun
Chao Zhang
Arnold Overwijk
VLM
OOD
50
22
0
27 Oct 2022
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
30
74
0
26 Oct 2022
Previous
1
2
3
...
14
15
16
...
18
19
20
Next