Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
50 / 11,214 papers shown
Title
Computer-Aided Design as Language
Yaroslav Ganin
Sergey Bartunov
Yujia Li
E. Keller
Stefano Saliceti
3DV
104
90
0
06 May 2021
Reliability Testing for Natural Language Processing Systems
Samson Tan
Chenyu You
K. Baxter
Araz Taeihagh
G. Bennett
Min-Yen Kan
15
38
0
06 May 2021
A Unified Transferable Model for ML-Enhanced DBMS
Ziniu Wu
Pei Yu
Peilun Yang
Rong Zhu
Yuxing Han
Yaliang Li
Defu Lian
K. Zeng
Jingren Zhou
42
31
0
06 May 2021
Training Quantum Embedding Kernels on Near-Term Quantum Computers
T. Hubregtsen
David Wierichs
Elies Gil-Fuster
Peter-Jan H. S. Derks
Paul K. Faehrmann
Johannes Jakob Meyer
28
95
0
05 May 2021
Rethinking Search: Making Domain Experts out of Dilettantes
Donald Metzler
Yi Tay
Dara Bahri
Marc Najork
LRM
38
46
0
05 May 2021
Image Embedding and Model Ensembling for Automated Chest X-Ray Interpretation
E. Giacomello
P. Lanzi
Daniele Loiacono
Luca Nassano
17
5
0
05 May 2021
TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Yongbiao Chen
Shenmin Zhang
Fangxin Liu
Zhigang Chang
Mang Ye
Zhengwei Qi Shanghai Jiao Tong University
ViT
29
50
0
05 May 2021
HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish
Robert Mroczkowski
Piotr Rybak
Alina Wróblewska
Ireneusz Gawlik
36
81
0
04 May 2021
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning
Xiangyu Peng
Siyan Li
Sarah Wiegreffe
Mark O. Riedl
LRM
50
38
0
04 May 2021
One Model to Rule them All: Towards Zero-Shot Learning for Databases
Benjamin Hilprecht
Carsten Binnig
VLM
29
33
0
03 May 2021
Data-driven Weight Initialization with Sylvester Solvers
Debasmit Das
Yash Bhalgat
Fatih Porikli
ODL
38
3
0
02 May 2021
Stealthy Backdoors as Compression Artifacts
Yulong Tian
Fnu Suya
Fengyuan Xu
David Evans
35
22
0
30 Apr 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
Entailment as Few-Shot Learner
Sinong Wang
Han Fang
Madian Khabsa
Hanzi Mao
Hao Ma
35
183
0
29 Apr 2021
MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains
Yaxing Wang
Abel Gonzalez-Garcia
Chenshen Wu
Luis Herranz
Fahad Shahbaz Khan
Shangling Jui
Joost van de Weijer
32
6
0
28 Apr 2021
Sifting out the features by pruning: Are convolutional networks the winning lottery ticket of fully connected ones?
Franco Pellegrini
Giulio Biroli
49
6
0
27 Apr 2021
Shellcode_IA32: A Dataset for Automatic Shellcode Generation
Pietro Liguori
Erfan Al-Hossami
Domenico Cotroneo
R. Natella
B. Cukic
Samira Shaikh
34
27
0
27 Apr 2021
AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions
M. Kišš
Karel Beneš
Michal Hradiš
64
13
0
27 Apr 2021
If your data distribution shifts, use self-learning
E. Rusak
Steffen Schneider
George Pachitariu
L. Eck
Peter V. Gehler
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
OOD
TTA
81
30
0
27 Apr 2021
One Billion Audio Sounds from GPU-enabled Modular Synthesis
Joseph P. Turian
Jordie Shier
George Tzanetakis
K. McNally
Max Henry
21
22
0
27 Apr 2021
PanGu-
α
α
α
: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
35
212
0
26 Apr 2021
Generating abstractive summaries of Lithuanian news articles using a transformer model
Lukas Stankevicius
M. Lukoševičius
24
2
0
23 Apr 2021
Partitioning sparse deep neural networks for scalable training and inference
G. Demirci
Hakan Ferhatosmanoglu
20
11
0
23 Apr 2021
Literature review on vulnerability detection using NLP technology
Jiajie Wu
39
14
0
23 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
Understanding and Avoiding AI Failures: A Practical Guide
R. M. Williams
Roman V. Yampolskiy
30
24
0
22 Apr 2021
All Tokens Matter: Token Labeling for Training Better Vision Transformers
Zihang Jiang
Qibin Hou
Li-xin Yuan
Daquan Zhou
Yujun Shi
Xiaojie Jin
Anran Wang
Jiashi Feng
ViT
27
204
0
22 Apr 2021
Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?
William Merrill
Yoav Goldberg
Roy Schwartz
Noah A. Smith
25
67
0
22 Apr 2021
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
29
66
0
21 Apr 2021
Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Ashish Shenoy
S. Bodapati
Monica Sunkara
S. Ronanki
Katrin Kirchhoff
31
21
0
21 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,190
0
20 Apr 2021
BERTić -- The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian
N. Ljubešić
D. Lauc
16
48
0
19 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
M. Wahib
15
9
0
19 Apr 2021
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Rui Cheng
Bichen Wu
Peizhao Zhang
Peter Vajda
Joseph E. Gonzalez
CLIP
VLM
21
31
0
18 Apr 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
223
180
0
18 Apr 2021
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Kang Min Yoo
Dongju Park
Jaewook Kang
Sang-Woo Lee
Woomyeong Park
36
235
0
18 Apr 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,125
0
18 Apr 2021
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Swaroop Mishra
Daniel Khashabi
Chitta Baral
Hannaneh Hajishirzi
LRM
60
719
0
18 Apr 2021
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
Mozhdeh Gheini
Xiang Ren
Jonathan May
LRM
31
105
0
18 Apr 2021
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
228
144
0
18 Apr 2021
ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table
Huifeng Guo
Wei Guo
Yong Gao
Ruiming Tang
Xiuqiang He
Wenzhi Liu
38
20
0
17 Apr 2021
Data Distillation for Text Classification
Yongqi Li
Wenjie Li
DD
27
28
0
17 Apr 2021
On the Importance of Effectively Adapting Pretrained Language Models for Active Learning
Katerina Margatina
Loïc Barrault
Nikolaos Aletras
27
36
0
16 Apr 2021
What to Pre-Train on? Efficient Intermediate Task Selection
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
21
94
0
16 Apr 2021
Editing Factual Knowledge in Language Models
Nicola De Cao
Wilker Aziz
Ivan Titov
KELM
68
476
0
16 Apr 2021
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
Yanai Elazar
Hongming Zhang
Yoav Goldberg
Dan Roth
ReLM
LRM
45
44
0
16 Apr 2021
Language Models are Few-Shot Butlers
Vincent Micheli
Franccois Fleuret
25
31
0
16 Apr 2021
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
13
80
0
16 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
41
370
0
16 Apr 2021
Previous
1
2
3
...
216
217
218
...
223
224
225
Next