Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
50 / 1,609 papers shown
Title
Fair is Better than Sensational:Man is to Doctor as Woman is to Doctor
Malvina Nissim
Rik van Noord
Rob van der Goot
FaML
73
103
0
23 May 2019
HellaSwag: Can a Machine Really Finish Your Sentence?
Rowan Zellers
Ari Holtzman
Yonatan Bisk
Ali Farhadi
Yejin Choi
170
2,485
0
19 May 2019
Story Ending Prediction by Transferable BERT
Zhongyang Li
Xiao Ding
Ting Liu
65
52
0
17 May 2019
MASS: Masked Sequence to Sequence Pre-training for Language Generation
Kaitao Song
Xu Tan
Tao Qin
Jianfeng Lu
Tie-Yan Liu
112
965
0
07 May 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems
Alex Jinpeng Wang
Yada Pruksachatkun
Nikita Nangia
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
265
2,312
0
02 May 2019
Unsupervised Data Augmentation for Consistency Training
Qizhe Xie
Zihang Dai
Eduard H. Hovy
Minh-Thang Luong
Quoc V. Le
135
2,316
0
29 Apr 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
122
1,899
0
23 Apr 2019
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
184
3,184
0
22 Apr 2019
Improving Multi-Task Deep Neural Networks via Knowledge Distillation for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
FedML
54
182
0
20 Apr 2019
Lipstick on a Pig: Debiasing Methods Cover up Systematic Gender Biases in Word Embeddings But do not Remove Them
Hila Gonen
Yoav Goldberg
100
571
0
09 Mar 2019
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs
Dheeru Dua
Yizhong Wang
Pradeep Dasigi
Gabriel Stanovsky
Sameer Singh
Matt Gardner
AIMat
96
955
0
01 Mar 2019
Massively Multilingual Neural Machine Translation
Roee Aharoni
Melvin Johnson
Orhan Firat
LRM
AI4CE
77
488
0
28 Feb 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
131
1,239
0
04 Feb 2019
Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
AI4CE
124
1,271
0
31 Jan 2019
Learning and Evaluating General Linguistic Intelligence
Dani Yogatama
Cyprien de Masson dÁutume
Jerome T. Connor
Tomás Kociský
Mike Chrzanowski
...
Angeliki Lazaridou
Wang Ling
Lei Yu
Chris Dyer
Phil Blunsom
ELM
AI4CE
154
210
0
31 Jan 2019
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
75
2,747
0
22 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
241
3,728
0
09 Jan 2019
An Empirical Model of Large-Batch Training
Sam McCandlish
Jared Kaplan
Dario Amodei
OpenAI Dota Team
65
277
0
14 Dec 2018
Sentence Encoders on STILTs: Supplementary Training on Intermediate Labeled-data Tasks
Jason Phang
Thibault Févry
Samuel R. Bowman
94
468
0
02 Nov 2018
ReCoRD: Bridging the Gap between Human and Machine Commonsense Reading Comprehension
Sheng Zhang
Xiaodong Liu
Jingjing Liu
Jianfeng Gao
Kevin Duh
Benjamin Van Durme
69
314
0
30 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Model Cards for Model Reporting
Margaret Mitchell
Simone Wu
Andrew Zaldivar
Parker Barnes
Lucy Vasserman
Ben Hutchinson
Elena Spitzer
Inioluwa Deborah Raji
Timnit Gebru
123
1,895
0
05 Oct 2018
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering
Todor Mihaylov
Peter Clark
Tushar Khot
Ashish Sabharwal
113
1,537
0
08 Sep 2018
WiC: the Word-in-Context Dataset for Evaluating Context-Sensitive Meaning Representations
Mohammad Taher Pilehvar
Jose Camacho-Collados
192
489
0
28 Aug 2018
Dissecting Contextual Word Embeddings: Architecture and Representation
Matthew E. Peters
Mark Neumann
Luke Zettlemoyer
Wen-tau Yih
96
430
0
27 Aug 2018
Meta-Learning for Low-Resource Neural Machine Translation
Jiatao Gu
Yong Wang
Yun Chen
Kyunghyun Cho
Victor O.K. Li
74
343
0
25 Aug 2018
CoQA: A Conversational Question Answering Challenge
Siva Reddy
Danqi Chen
Christopher D. Manning
RALM
HAI
98
1,202
0
21 Aug 2018
QuAC : Question Answering in Context
Eunsol Choi
He He
Mohit Iyyer
Mark Yatskar
Wen-tau Yih
Yejin Choi
Percy Liang
Luke Zettlemoyer
119
826
0
21 Aug 2018
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
85
753
0
10 Jul 2018
The Natural Language Decathlon: Multitask Learning as Question Answering
Bryan McCann
N. Keskar
Caiming Xiong
R. Socher
AIMat
MLLM
BDL
142
645
0
20 Jun 2018
Know What You Don't Know: Unanswerable Questions for SQuAD
Pranav Rajpurkar
Robin Jia
Percy Liang
RALM
ELM
279
2,840
0
11 Jun 2018
A Simple Method for Commonsense Reasoning
Trieu H. Trinh
Quoc V. Le
LRM
ReLM
93
433
0
07 Jun 2018
Gender Bias in Coreference Resolution
Rachel Rudinger
Jason Naradowsky
Brian Leonard
Benjamin Van Durme
65
642
0
25 Apr 2018
A Call for Clarity in Reporting BLEU Scores
Matt Post
151
2,988
0
23 Apr 2018
Think you have Solved Question Answering? Try ARC, the AI2 Reasoning Challenge
Peter Clark
Isaac Cowhey
Oren Etzioni
Tushar Khot
Ashish Sabharwal
Carissa Schoenick
Oyvind Tafjord
ELM
RALM
LRM
158
2,610
0
14 Mar 2018
Annotation Artifacts in Natural Language Inference Data
Suchin Gururangan
Swabha Swayamdipta
Omer Levy
Roy Schwartz
Samuel R. Bowman
Noah A. Smith
150
1,176
0
06 Mar 2018
Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu
Mohammad Saleh
Etienne Pot
Ben Goodrich
Ryan Sepassi
Lukasz Kaiser
Noam M. Shazeer
CVBM
188
799
0
30 Jan 2018
Deep Learning Scaling is Predictable, Empirically
Joel Hestness
Sharan Narang
Newsha Ardalani
G. Diamos
Heewoo Jun
Hassan Kianinejad
Md. Mostofa Ali Patwary
Yang Yang
Yanqi Zhou
92
741
0
01 Dec 2017
Few-shot Autoregressive Density Estimation: Towards Learning to Learn Distributions
Scott E. Reed
Yutian Chen
T. Paine
Aaron van den Oord
S. M. Ali Eslami
Danilo Jimenez Rezende
Oriol Vinyals
Nando de Freitas
91
88
0
27 Oct 2017
Learned in Translation: Contextualized Word Vectors
Bryan McCann
James Bradbury
Caiming Xiong
R. Socher
117
909
0
01 Aug 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension
Mandar Joshi
Eunsol Choi
Daniel S. Weld
Luke Zettlemoyer
RALM
207
2,654
0
09 May 2017
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Guokun Lai
Qizhe Xie
Hanxiao Liu
Yiming Yang
Eduard H. Hovy
ELM
183
1,348
0
15 Apr 2017
Domain Randomization for Transferring Deep Neural Networks from Simulation to the Real World
Joshua Tobin
Rachel Fong
Alex Ray
Jonas Schneider
Wojciech Zaremba
Pieter Abbeel
253
2,966
0
20 Mar 2017
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
823
11,909
0
09 Mar 2017
Learning to Optimize Neural Nets
Ke Li
Jitendra Malik
59
132
0
01 Mar 2017
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
248
2,644
0
23 Jan 2017
RL
2
^2
2
: Fast Reinforcement Learning via Slow Reinforcement Learning
Yan Duan
John Schulman
Xi Chen
Peter L. Bartlett
Ilya Sutskever
Pieter Abbeel
OffRL
96
1,019
0
09 Nov 2016
Sequence-Level Knowledge Distillation
Yoon Kim
Alexander M. Rush
116
1,115
0
25 Jun 2016
The LAMBADA dataset: Word prediction requiring a broad discourse context
Denis Paperno
Germán Kruszewski
Angeliki Lazaridou
Q. N. Pham
Raffaella Bernardi
Sandro Pezzelle
Marco Baroni
Gemma Boleda
Raquel Fernández
127
718
0
20 Jun 2016
Previous
1
2
3
...
31
32
33
Next