Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
v1
v2
v3
v4 (latest)
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
50 / 12,243 papers shown
Title
Learning to Recognize Dialect Features
Dorottya Demszky
D. Sharma
J. Clark
Vinodkumar Prabhakaran
Jacob Eisenstein
216
39
0
23 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
71
25
0
23 Oct 2020
Robust Document Representations using Latent Topics and Metadata
Natraj Raman
Armineh Nourbakhsh
Sameena Shah
Manuela Veloso
21
0
0
23 Oct 2020
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
Chong Chen
Jiawei Han
VLM
120
54
0
23 Oct 2020
An Analysis of LIME for Text Data
Dina Mardaoui
Damien Garreau
FAtt
187
45
0
23 Oct 2020
Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension
Siamak Shakeri
Noah Constant
Mihir Kale
Linting Xue
SyDa
72
28
0
22 Oct 2020
The Turking Test: Can Language Models Understand Instructions?
Avia Efrat
Omer Levy
ELM
LRM
114
96
0
22 Oct 2020
Language Models are Open Knowledge Graphs
Chenguang Wang
Xiao Liu
Basel Alomair
SSL
KELM
79
137
0
22 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
86
63
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
702
41,681
0
22 Oct 2020
AdapterDrop: On the Efficiency of Adapters in Transformers
Andreas Rucklé
Gregor Geigle
Max Glockner
Tilman Beck
Jonas Pfeiffer
Nils Reimers
Iryna Gurevych
125
267
0
22 Oct 2020
Incorporating Stylistic Lexical Preferences in Generative Language Models
Hrituraj Singh
Gaurav Verma
Balaji Vasan Srinivasan
23
5
0
22 Oct 2020
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Junkun Chen
Mingbo Ma
Renjie Zheng
Liang Huang
90
21
0
22 Oct 2020
Is Retriever Merely an Approximator of Reader?
Sohee Yang
Minjoon Seo
RALM
85
42
0
21 Oct 2020
Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
Anand Panchbhai
Tommaso Soru
Edgard Marx
18
5
0
21 Oct 2020
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation
Laurel J. Orr
Megan Leszczynski
Simran Arora
Sen Wu
Neel Guha
Xiao Ling
Christopher Ré
209
48
0
20 Oct 2020
Local Knowledge Powered Conversational Agents
Sashank Santhanam
Ming-Yu Liu
Raul Puri
Mohammad Shoeybi
M. Patwary
Bryan Catanzaro
93
4
0
20 Oct 2020
Neural Language Modeling for Contextualized Temporal Graph Generation
Aman Madaan
Yiming Yang
101
20
0
20 Oct 2020
Optimism in the Face of Adversity: Understanding and Improving Deep Learning through Adversarial Robustness
Guillermo Ortiz-Jiménez
Apostolos Modas
Seyed-Mohsen Moosavi-Dezfooli
P. Frossard
AAML
121
48
0
19 Oct 2020
Consistency and Coherency Enhanced Story Generation
Wei Wang
Piji Li
Haitao Zheng
71
11
0
17 Oct 2020
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding
Yanru Qu
Dinghan Shen
Yelong Shen
Sandra Sajeev
Jiawei Han
Weizhu Chen
204
69
0
16 Oct 2020
Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data
Mao Ye
Dhruv Choudhary
Jiecao Yu
Ellie Wen
Zeliang Chen
Jiyan Yang
Jongsoo Park
Qiang Liu
A. Kejariwal
56
9
0
16 Oct 2020
Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models
Peter West
Ximing Lu
Ari Holtzman
Chandra Bhagavatula
Jena D. Hwang
Yejin Choi
OffRL
59
13
0
16 Oct 2020
An Approximation Algorithm for Optimal Subarchitecture Extraction
Adrian de Wynter
70
1
0
16 Oct 2020
For self-supervised learning, Rationality implies generalization, provably
Yamini Bansal
Gal Kaplun
Boaz Barak
OOD
SSL
112
22
0
16 Oct 2020
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
Preetum Nakkiran
Behnam Neyshabur
Hanie Sedghi
OffRL
97
11
0
16 Oct 2020
PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music
Hongru Liang
Wenqiang Lei
P. Chan
Zhenglu Yang
Maosong Sun
Tat-Seng Chua
64
41
0
16 Oct 2020
Masked Contrastive Representation Learning for Reinforcement Learning
Jinhua Zhu
Yingce Xia
Lijun Wu
Jiajun Deng
Wen-gang Zhou
Tao Qin
Houqiang Li
SSL
OffRL
110
60
0
15 Oct 2020
Decoding Methods for Neural Narrative Generation
Alexandra DeLucia
Aaron Mueller
Xiang Lisa Li
João Sedoc
62
26
0
14 Oct 2020
Explaining Creative Artifacts
Lav Varshney
Nazneen Rajani
R. Socher
129
2
0
14 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
59
20
0
14 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
94
98
0
14 Oct 2020
Neural Databases
James Thorne
Majid Yazdani
Marzieh Saeidi
Fabrizio Silvestri
Sebastian Riedel
A. Halevy
NAI
96
9
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
387
628
0
13 Oct 2020
MixCo: Mix-up Contrastive Learning for Visual Representation
Sungnyun Kim
Gihun Lee
Sangmin Bae
Seyoung Yun
SSL
167
81
0
13 Oct 2020
Improving Text Generation with Student-Forcing Optimal Transport
Guoyin Wang
Chunyuan Li
Jianqiao Li
Hao Fu
Yuh-Chen Lin
...
Ruiyi Zhang
Wenlin Wang
Dinghan Shen
Qian Yang
Lawrence Carin
OT
78
18
0
12 Oct 2020
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jeff Da
Keisuke Sakaguchi
Antoine Bosselut
Yejin Choi
81
415
0
12 Oct 2020
Neural, Symbolic and Neural-Symbolic Reasoning on Knowledge Graphs
Jing Zhang
Bo Chen
Lingxi Zhang
Xirui Ke
Haipeng Ding
NAI
97
3
0
12 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
Alex Warstadt
Yian Zhang
Haau-Sing Li
Haokun Liu
Samuel R. Bowman
SSL
AI4CE
78
21
0
11 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
101
46
0
11 Oct 2020
What causes the test error? Going beyond bias-variance via ANOVA
Licong Lin
Yan Sun
93
34
0
11 Oct 2020
AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data
Silei Xu
Sina J. Semnani
Giovanni Campagna
M. Lam
76
52
0
09 Oct 2020
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
67
42
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
18
3
0
07 Oct 2020
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
87
89
0
07 Oct 2020
A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments
M. Golzadeh
Alexandre Decan
Damien Legay
T. Mens
57
78
0
07 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSL
AI4TS
75
12
0
07 Oct 2020
A Closer Look at Codistillation for Distributed Training
Shagun Sodhani
Olivier Delalleau
Mahmoud Assran
Koustuv Sinha
Nicolas Ballas
Michael G. Rabbat
129
8
0
06 Oct 2020
A Transformer-based Framework for Multivariate Time Series Representation Learning
George Zerveas
Srideepika Jayaraman
Dhaval Patel
A. Bhamidipaty
Carsten Eickhoff
AI4TS
109
940
0
06 Oct 2020
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
Wei Ping
Shuohang Wang
Yu Cheng
Zhe Gan
R. Jia
Yue Liu
Jingjing Liu
AAML
215
116
0
05 Oct 2020
Previous
1
2
3
...
240
241
242
243
244
245
Next