Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
50 / 11,497 papers shown
Title
AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions
M. Kišš
Karel Beneš
Michal Hradiš
64
13
0
27 Apr 2021
If your data distribution shifts, use self-learning
E. Rusak
Steffen Schneider
George Pachitariu
L. Eck
Peter V. Gehler
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
OOD
TTA
81
30
0
27 Apr 2021
One Billion Audio Sounds from GPU-enabled Modular Synthesis
Joseph P. Turian
Jordie Shier
George Tzanetakis
K. McNally
Max Henry
21
22
0
27 Apr 2021
PanGu-
α
α
α
: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
35
212
0
26 Apr 2021
Generating abstractive summaries of Lithuanian news articles using a transformer model
Lukas Stankevicius
M. Lukoševičius
24
2
0
23 Apr 2021
Partitioning sparse deep neural networks for scalable training and inference
G. Demirci
Hakan Ferhatosmanoglu
20
11
0
23 Apr 2021
Literature review on vulnerability detection using NLP technology
Jiajie Wu
39
14
0
23 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
Understanding and Avoiding AI Failures: A Practical Guide
R. M. Williams
Roman V. Yampolskiy
30
24
0
22 Apr 2021
All Tokens Matter: Token Labeling for Training Better Vision Transformers
Zihang Jiang
Qibin Hou
Li-xin Yuan
Daquan Zhou
Yujun Shi
Xiaojie Jin
Anran Wang
Jiashi Feng
ViT
27
203
0
22 Apr 2021
Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?
William Merrill
Yoav Goldberg
Roy Schwartz
Noah A. Smith
25
67
0
22 Apr 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
29
66
0
21 Apr 2021
Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Ashish Shenoy
S. Bodapati
Monica Sunkara
S. Ronanki
Katrin Kirchhoff
31
21
0
21 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,190
0
20 Apr 2021
BERTić -- The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian
N. Ljubešić
D. Lauc
16
48
0
19 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
M. Wahib
15
9
0
19 Apr 2021
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Rui Cheng
Bichen Wu
Peizhao Zhang
Peter Vajda
Joseph E. Gonzalez
CLIP
VLM
21
31
0
18 Apr 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
223
180
0
18 Apr 2021
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Kang Min Yoo
Dongju Park
Jaewook Kang
Sang-Woo Lee
Woomyeong Park
39
235
0
18 Apr 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
281
1,125
0
18 Apr 2021
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Swaroop Mishra
Daniel Khashabi
Chitta Baral
Hannaneh Hajishirzi
LRM
60
719
0
18 Apr 2021
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
Mozhdeh Gheini
Xiang Ren
Jonathan May
LRM
31
105
0
18 Apr 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
228
144
0
18 Apr 2021
ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table
Huifeng Guo
Wei Guo
Yong Gao
Ruiming Tang
Xiuqiang He
Wenzhi Liu
43
20
0
17 Apr 2021
Data Distillation for Text Classification
Yongqi Li
Wenjie Li
DD
30
28
0
17 Apr 2021
On the Importance of Effectively Adapting Pretrained Language Models for Active Learning
Katerina Margatina
Loïc Barrault
Nikolaos Aletras
27
36
0
16 Apr 2021
What to Pre-Train on? Efficient Intermediate Task Selection
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
24
94
0
16 Apr 2021
Editing Factual Knowledge in Language Models
Nicola De Cao
Wilker Aziz
Ivan Titov
KELM
68
476
0
16 Apr 2021
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
Yanai Elazar
Hongming Zhang
Yoav Goldberg
Dan Roth
ReLM
LRM
45
44
0
16 Apr 2021
Language Models are Few-Shot Butlers
Vincent Micheli
Franccois Fleuret
25
31
0
16 Apr 2021
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
13
80
0
16 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
41
370
0
16 Apr 2021
Sublanguage: A Serious Issue Affects Pretrained Models in Legal Domain
Nguyen Ha Thanh
Le-Minh Nguyen
ELM
AILaw
11
2
0
15 Apr 2021
Does BERT Pretrained on Clinical Notes Reveal Sensitive Data?
Eric P. Lehman
Sarthak Jain
Karl Pichotta
Yoav Goldberg
Byron C. Wallace
OOD
MIACV
24
118
0
15 Apr 2021
Actionable Models: Unsupervised Offline Reinforcement Learning of Robotic Skills
Yevgen Chebotar
Karol Hausman
Yao Lu
Ted Xiao
Dmitry Kalashnikov
...
A. Irpan
Benjamin Eysenbach
Ryan Julian
Chelsea Finn
Sergey Levine
SSL
OffRL
32
146
0
15 Apr 2021
How to Train BERT with an Academic Budget
Peter Izsak
Moshe Berchansky
Omer Levy
23
113
0
15 Apr 2021
Self-supervised Video Object Segmentation by Motion Grouping
Charig Yang
Hala Lamdouar
Erika Lu
Andrew Zisserman
Weidi Xie
VOS
OCL
30
157
0
15 Apr 2021
KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction
Xiang Chen
Ningyu Zhang
Xin Xie
Shumin Deng
Yunzhi Yao
Chuanqi Tan
Fei Huang
Luo Si
Huajun Chen
38
402
0
15 Apr 2021
Generating Datasets with Pretrained Language Models
Timo Schick
Hinrich Schütze
24
234
0
15 Apr 2021
Unlocking Compositional Generalization in Pre-trained Models Using Intermediate Representations
Jonathan Herzig
Peter Shaw
Ming-Wei Chang
Kelvin Guu
Panupong Pasupat
Yuan Zhang
AI4CE
30
67
0
15 Apr 2021
Domain Adaptation and Multi-Domain Adaptation for Neural Machine Translation: A Survey
Danielle Saunders
AI4CE
27
86
0
14 Apr 2021
Natural-Language Multi-Agent Simulations of Argumentative Opinion Dynamics
Gregor Betz
LLMAG
AI4CE
19
9
0
14 Apr 2021
Developing a Conversational Recommendation System for Navigating Limited Options
Victor S. Bursztyn
Jennifer Healey
Eunyee Koh
Nedim Lipka
Larry Birnbaum
17
7
0
13 Apr 2021
Relational World Knowledge Representation in Contextual Language Models: A Review
Tara Safavi
Danai Koutra
KELM
38
51
0
12 Apr 2021
Survey on reinforcement learning for language processing
Víctor Uc Cetina
Nicolás Navarro-Guerrero
A. Martín-González
C. Weber
S. Wermter
OffRL
24
101
0
12 Apr 2021
FUDGE: Controlled Text Generation With Future Discriminators
Kevin Kaichuang Yang
Dan Klein
39
314
0
12 Apr 2021
Adapting Language Models for Zero-shot Learning by Meta-tuning on Dataset and Prompt Collections
Ruiqi Zhong
Kristy Lee
Zheng-Wei Zhang
Dan Klein
39
166
0
10 Apr 2021
Text2Chart: A Multi-Staged Chart Generator from Natural Language Text
Md. Mahinur Rashid
Hasin Kawsar Jahan
Annysha Huzzat
Riyasaat Ahmed Rahul
Tamim Bin Zakir
F. Meem
Md. Saddam Hossain Mukta
Swakkhar Shatabda
35
8
0
09 Apr 2021
Efficient Large-Scale Language Model Training on GPU Clusters Using Megatron-LM
Deepak Narayanan
M. Shoeybi
Jared Casper
P. LeGresley
M. Patwary
...
Prethvi Kashinkunti
J. Bernauer
Bryan Catanzaro
Amar Phanishayee
Matei A. Zaharia
MoE
37
651
0
09 Apr 2021
Previous
1
2
3
...
222
223
224
...
228
229
230
Next