Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
v1
v2
v3
v4 (latest)
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
50 / 12,278 papers shown
Title
Behavior From the Void: Unsupervised Active Pre-Training
Hao Liu
Pieter Abbeel
VLM
SSL
124
206
0
08 Mar 2021
SCNN: Swarm Characteristic Neural Network
Nguyen Ha Thanh
Le-Minh Nguyen
25
0
0
08 Mar 2021
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
Bohan Wu
Suraj Nair
Roberto Martin-Martin
Li Fei-Fei
Chelsea Finn
DRL
89
102
0
06 Mar 2021
Putting Humans in the Natural Language Processing Loop: A Survey
Zijie J. Wang
Dongjin Choi
Shenyu Xu
Diyi Yang
LM&MA
91
74
0
06 Mar 2021
Perspectives and Prospects on Transformer Architecture for Cross-Modal Tasks with Language and Vision
Andrew Shin
Masato Ishii
T. Narihira
133
39
0
06 Mar 2021
The whole brain architecture approach: Accelerating the development of artificial general intelligence by referring to the brain
Hiroshi Yamakawa
AI4CE
50
17
0
06 Mar 2021
Accelerating SLIDE Deep Learning on Modern CPUs: Vectorization, Quantizations, Memory Optimizations, and More
Shabnam Daghaghi
Nicholas Meisburger
Mengnan Zhao
Yong Wu
Sameh Gobriel
Charlie Tai
Anshumali Shrivastava
BDL
VLM
MQ
48
33
0
06 Mar 2021
Causal Analysis of Agent Behavior for AI Safety
Grégoire Delétang
Jordi Grau-Moya
Miljan Martic
Tim Genewein
Tom McGrath
Vladimir Mikulik
M. Kunesch
Shane Legg
Pedro A. Ortega
CML
76
7
0
05 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
Basel Alomair
Jacob Steinhardt
ReLM
FaML
224
2,413
0
05 Mar 2021
OperA: Attention-Regularized Transformers for Surgical Phase Recognition
Tobias Czempiel
Magdalini Paschali
D. Ostler
S. T. Kim
Benjamin Busam
Nassir Navab
MedIm
107
89
0
05 Mar 2021
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
93
211
0
05 Mar 2021
Training a First-Order Theorem Prover from Synthetic Data
Vlad Firoiu
Eser Aygun
Ankit Anand
Zafarali Ahmed
Xavier Glorot
Laurent Orseau
Lei Zhang
Doina Precup
Shibl Mourad
NAI
78
14
0
05 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
161
388
0
05 Mar 2021
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
133
35
0
04 Mar 2021
OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge Services
Xiao Liu
Da Yin
Jingnan Zheng
Xingjian Zhang
Peng Zhang
Hongxia Yang
Yuxiao Dong
Jie Tang
VLM
102
32
0
03 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
130
362
0
03 Mar 2021
Self-supervised Pretraining of Visual Features in the Wild
Priya Goyal
Mathilde Caron
Benjamin Lefaudeux
Min Xu
Pengchao Wang
...
Mannat Singh
Vitaliy Liptchinsky
Ishan Misra
Armand Joulin
Piotr Bojanowski
VLM
SSL
96
274
0
02 Mar 2021
Disentangling Syntax and Semantics in the Brain with Deep Networks
Charlotte Caucheteux
Alexandre Gramfort
J. King
129
74
0
02 Mar 2021
Generalizing to Unseen Domains: A Survey on Domain Generalization
Jindong Wang
Cuiling Lan
Chang-Shu Liu
Yidong Ouyang
Tao Qin
Wang Lu
Yiqiang Chen
Wenjun Zeng
Philip S. Yu
OOD
265
1,240
0
02 Mar 2021
Coordination Among Neural Modules Through a Shared Global Workspace
Anirudh Goyal
Aniket Didolkar
Alex Lamb
Kartikeya Badola
Nan Rosemary Ke
Nasim Rahaman
Jonathan Binas
Charles Blundell
Michael C. Mozer
Yoshua Bengio
219
99
0
01 Mar 2021
OmniNet: Omnidirectional Representations from Transformers
Yi Tay
Mostafa Dehghani
V. Aribandi
Jai Gupta
Philip Pham
Zhen Qin
Dara Bahri
Da-Cheng Juan
Donald Metzler
113
30
0
01 Mar 2021
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
145
134
0
01 Mar 2021
Representation Learning for Event-based Visuomotor Policies
Sai H. Vemprala
Sami Mian
Ashish Kapoor
55
23
0
01 Mar 2021
Query Rewriting via Cycle-Consistent Translation for E-Commerce Search
Yiming Qiu
Kang Zhang
Han Zhang
Songlin Wang
Sulong Xu
Yun Xiao
Bo Long
Wen-Yun Yang
84
16
0
01 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
107
47
0
28 Feb 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
319
388
0
28 Feb 2021
A Survey on Stance Detection for Mis- and Disinformation Identification
Momchil Hardalov
Arnav Arora
Preslav Nakov
Isabelle Augenstein
203
136
0
27 Feb 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
405
1,592
0
27 Feb 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
1.0K
30,029
0
26 Feb 2021
What Doesn't Kill You Makes You Robust(er): How to Adversarially Train against Data Poisoning
Jonas Geiping
Liam H. Fowl
Gowthami Somepalli
Micah Goldblum
Michael Moeller
Tom Goldstein
TDI
AAML
SILM
46
41
0
26 Feb 2021
Chess as a Testbed for Language Model State Tracking
Shubham Toshniwal
Sam Wiseman
Karen Livescu
Kevin Gimpel
74
54
0
26 Feb 2021
Automated essay scoring using efficient transformer-based language models
C. Ormerod
Akanksha Malhotra
Amir Jafari
46
31
0
25 Feb 2021
Investigating the Limitations of Transformers with Simple Arithmetic Tasks
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Li
LRM
106
130
0
25 Feb 2021
Self-Tuning for Data-Efficient Deep Learning
Ximei Wang
Jing Gao
Mingsheng Long
Jianmin Wang
BDL
86
71
0
25 Feb 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
92
54
0
25 Feb 2021
Spanish Biomedical and Clinical Language Embeddings
Asier Gutiérrez-Fandiño
Jordi Armengol-Estapé
C. Carrino
Ona de Gibert
Aitor Gonzalez-Agirre
Marta Villegas
28
5
0
25 Feb 2021
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Victor Campos
Pablo Sprechmann
Steven Hansen
André Barreto
Steven Kapturowski
Alex Vitvitskyi
Adria Puigdomenech Badia
Charles Blundell
OffRL
OnRL
72
26
0
24 Feb 2021
Automated Quality Assessment of Cognitive Behavioral Therapy Sessions Through Highly Contextualized Language Representations
Nikolaos Flemotomos
Víctor R. Martínez
Zhuohao Chen
Torrey A. Creed
David C. Atkins
Shrikanth Narayanan
62
31
0
23 Feb 2021
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
Harold Ott
Jasmin Bogatinovski
Alexander Acker
S. Nedelkoski
O. Kao
23
31
0
23 Feb 2021
Equivariant neural networks for inverse problems
E. Celledoni
Matthias Joachim Ehrhardt
Christian Etmann
B. Owren
Carola-Bibiane Schönlieb
Ferdia Sherry
MedIm
AI4CE
83
27
0
23 Feb 2021
Parallelizing Legendre Memory Unit Training
Narsimha Chilkuri
C. Eliasmith
88
39
0
22 Feb 2021
Towards Causal Representation Learning
Bernhard Schölkopf
Francesco Locatello
Stefan Bauer
Nan Rosemary Ke
Nal Kalchbrenner
Anirudh Goyal
Yoshua Bengio
OOD
CML
AI4CE
155
322
0
22 Feb 2021
Position Information in Transformers: An Overview
Philipp Dufter
Martin Schmitt
Hinrich Schütze
93
148
0
22 Feb 2021
Revisiting Classification Perspective on Scene Text Recognition
Hongxiang Cai
Jun Sun
Yichao Xiong
73
10
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
106
301
0
22 Feb 2021
Medical Transformer: Gated Axial-Attention for Medical Image Segmentation
Jeya Maria Jose Valanarasu
Poojan Oza
Ilker Hacihaliloglu
Vishal M. Patel
ViT
MedIm
135
1,003
0
21 Feb 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
120
9
0
20 Feb 2021
VisualGPT: Data-efficient Adaptation of Pretrained Language Models for Image Captioning
Jun Chen
Han Guo
Kai Yi
Boyang Albert Li
Mohamed Elhoseiny
VLM
149
227
0
20 Feb 2021
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
Brooke Stephenson
Thomas Hueber
Laurent Girin
Laurent Besacier
89
10
0
19 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
357
3,741
0
18 Feb 2021
Previous
1
2
3
...
235
236
237
...
244
245
246
Next