Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
45 / 11,595 papers shown
Title
Memory-Efficient Pipeline-Parallel DNN Training
Deepak Narayanan
Amar Phanishayee
Kaiyu Shi
Xie Chen
Matei A. Zaharia
MoE
36
212
0
16 Jun 2020
Surrogate gradients for analog neuromorphic computing
Benjamin Cramer
Sebastian Billaudelle
Simeon Kanya
Aron Leibfried
Andreas Grubl
...
Korbinian Schreiber
Yannik Stradmann
Johannes Weis
Johannes Schemmel
Friedemann Zenke
24
107
0
12 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
433
0
11 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
87
1,655
0
08 Jun 2020
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
14
135
0
08 Jun 2020
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
64
2,631
0
05 Jun 2020
A Survey on Transfer Learning in Natural Language Processing
Zaid Alyafeai
Maged S. Alshaibani
Irfan Ahmad
30
72
0
31 May 2020
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in Dialogue Systems
Zehao Lin
Shaobo Cui
Guodun Li
Xiaoming Kang
Feng Ji
Feng-Lin Li
Zhongzhou Zhao
Haiqing Chen
Yin Zhang
34
1
0
27 May 2020
Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction
L. Rasmy
Yang Xiang
Z. Xie
Cui Tao
Degui Zhi
AI4MH
LM&MA
29
659
0
22 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
32
472
0
15 May 2020
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
220
190
0
03 May 2020
Reinforcement Learning with Augmented Data
Michael Laskin
Kimin Lee
Adam Stooke
Lerrel Pinto
Pieter Abbeel
A. Srinivas
OffRL
20
648
0
30 Apr 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAML
XAI
46
371
0
30 Apr 2020
Deep Learning for Time Series Forecasting: Tutorial and Literature Survey
Konstantinos Benidis
Syama Sundar Rangapuram
Valentin Flunkert
Bernie Wang
Danielle C. Maddix
...
David Salinas
Lorenzo Stella
François-Xavier Aubet
Laurent Callot
Tim Januschowski
AI4TS
27
176
0
21 Apr 2020
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
24
351
0
21 Apr 2020
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Chunyuan Li
Xiang Gao
Yuan Li
Baolin Peng
Xiujun Li
Yizhe Zhang
Jianfeng Gao
SSL
DRL
32
181
0
05 Apr 2020
A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction
Zitao Chen
Guanpeng Li
Karthik Pattabiraman
AAML
AI4CE
28
17
0
30 Mar 2020
Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat
Alina Arseniev-Koehler
J. Foster
43
46
0
24 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
246
1,454
0
18 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
30
276
0
10 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
15
3
0
02 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
17
248
0
29 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,532
0
23 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,591
0
21 Jan 2020
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data
E. Steinberg
Kenneth Jung
Jason Alan Fries
Conor K. Corbin
Stephen R. Pfohl
N. Shah
29
103
0
06 Jan 2020
Fast and energy-efficient neuromorphic deep learning with first-spike times
Julian Goltz
Laura Kriener
A. Baumbach
Sebastian Billaudelle
O. Breitwieser
...
Á. F. Kungl
Walter Senn
Johannes Schemmel
K. Meier
Mihai A. Petrovici
35
126
0
24 Dec 2019
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
11
252
0
07 Nov 2019
Discovering the Compositional Structure of Vector Representations with Role Learning Networks
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
29
43
0
21 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
24
15
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
59
70
0
09 Oct 2019
DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators
Lu Lu
Pengzhan Jin
George Karniadakis
43
2,035
0
08 Oct 2019
Soft-Label Dataset Distillation and Text Dataset Distillation
Ilia Sucholutsky
Matthias Schonlau
DD
33
132
0
06 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
29
35
0
04 Oct 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,620
0
18 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,836
0
17 Sep 2019
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
225
621
0
03 Sep 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
25
133
0
22 Jul 2019
An Information Theoretic Interpretation to Deep Neural Networks
Shao-Lun Huang
Xiangxiang Xu
Lizhong Zheng
G. Wornell
FAtt
22
41
0
16 May 2019
A Brain-inspired Algorithm for Training Highly Sparse Neural Networks
Zahra Atashgahi
Joost Pieterse
Shiwei Liu
Decebal Constantin Mocanu
Raymond N. J. Veldhuis
Mykola Pechenizkiy
35
15
0
17 Mar 2019
Investigating Antigram Behaviour using Distributional Semantics
Saptarshi Sengupta
16
0
0
15 Jan 2019
O2A: One-shot Observational learning with Action vectors
Leo Pauly
Wisdom C. Agboh
David C. Hogg
R. Fuentes
57
9
0
17 Oct 2018
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
493
11,727
0
09 Mar 2017
Quantifying the probable approximation error of probabilistic inference programs
Marco F. Cusumano-Towner
Vikash K. Mansinghka
33
7
0
31 May 2016
Previous
1
2
3
...
230
231
232