Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.14165
Cited By
Language Models are Few-Shot Learners
28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Language Models are Few-Shot Learners"
50 / 11,497 papers shown
Title
Automated Identification of Toxic Code Reviews Using ToxiCR
Jaydeb Sarker
Asif Kamal Turzo
Mingyou Dong
Amiangshu Bosu
27
31
0
26 Feb 2022
Combining Observational and Randomized Data for Estimating Heterogeneous Treatment Effects
Tobias Hatt
Jeroen Berrevoets
Alicia Curth
Stefan Feuerriegel
M. Schaar
CML
52
29
0
25 Feb 2022
Rethinking the Role of Demonstrations: What Makes In-Context Learning Work?
Sewon Min
Xinxi Lyu
Ari Holtzman
Mikel Artetxe
M. Lewis
Hannaneh Hajishirzi
Luke Zettlemoyer
LLMAG
LRM
76
1,409
0
25 Feb 2022
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
29
4
0
24 Feb 2022
Probing BERT's priors with serial reproduction chains
Takateru Yamakoshi
Thomas Griffiths
Robert D. Hawkins
29
12
0
24 Feb 2022
Is Neuro-Symbolic AI Meeting its Promise in Natural Language Processing? A Structured Review
Kyle Hamilton
Aparna Nayak
Bojan Bozic
Luca Longo
NAI
31
57
0
24 Feb 2022
Transformers in Medical Image Analysis: A Review
Kelei He
Chen Gan
Zhuoyuan Li
I. Rekik
Zihao Yin
Wen Ji
Yang Gao
Qian Wang
Junfeng Zhang
Dinggang Shen
ViT
MedIm
28
255
0
24 Feb 2022
Pretraining without Wordpieces: Learning Over a Vocabulary of Millions of Words
Zhangyin Feng
Duyu Tang
Cong Zhou
Junwei Liao
Shuangzhi Wu
Xiaocheng Feng
Bing Qin
Yunbo Cao
Shuming Shi
VLM
28
9
0
24 Feb 2022
From Natural Language to Simulations: Applying GPT-3 Codex to Automate Simulation Modeling of Logistics Systems
I. Jackson
M. J. Sáenz
16
8
0
24 Feb 2022
NoisyTune: A Little Noise Can Help You Finetune Pretrained Language Models Better
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
Xing Xie
25
58
0
24 Feb 2022
Sky Computing: Accelerating Geo-distributed Computing in Federated Learning
Jie Zhu
Shenggui Li
Yang You
FedML
16
5
0
24 Feb 2022
Using natural language prompts for machine translation
Xavier Garcia
Orhan Firat
AI4CE
30
30
0
23 Feb 2022
COLD Decoding: Energy-based Constrained Text Generation with Langevin Dynamics
Lianhui Qin
Sean Welleck
Daniel Khashabi
Yejin Choi
AI4CE
58
144
0
23 Feb 2022
Zero-shot Cross-lingual Transfer of Prompt-based Tuning with a Unified Multilingual Prompt
Lianzhe Huang
Shuming Ma
Dongdong Zhang
Furu Wei
Houfeng Wang
VLM
LRM
26
32
0
23 Feb 2022
Prompt-Learning for Short Text Classification
Yi Zhu
Xinke Zhou
Jipeng Qiang
Yun Li
Yunhao Yuan
Xindong Wu
VLM
18
34
0
23 Feb 2022
Memory Planning for Deep Neural Networks
Maksim Levental
33
4
0
23 Feb 2022
Blockchain Framework for Artificial Intelligence Computation
Jie You
6
8
0
23 Feb 2022
Evaluating Feature Attribution Methods in the Image Domain
Arne Gevaert
Axel-Jan Rousseau
Thijs Becker
D. Valkenborg
T. D. Bie
Yvan Saeys
FAtt
27
22
0
22 Feb 2022
Generating Videos with Dynamics-aware Implicit Generative Adversarial Networks
Sihyun Yu
Jihoon Tack
Sangwoo Mo
Hyunsu Kim
Junho Kim
Jung-Woo Ha
Jinwoo Shin
DiffM
VGen
42
199
0
21 Feb 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
81
221
0
21 Feb 2022
Items from Psychometric Tests as Training Data for Personality Profiling Models of Twitter Users
Anne Kreuter
Kai Sassenberg
Roman Klinger
21
6
0
21 Feb 2022
Ligandformer: A Graph Neural Network for Predicting Compound Property with Robust Interpretation
Jinjiang Guo
Qi Liu
Han Guo
Xi Lu
AI4CE
24
3
0
21 Feb 2022
GPT-based Open-Ended Knowledge Tracing
Naiming Liu
Zichao Wang
Richard G. Baraniuk
Andrew S. Lan
AI4Ed
32
3
0
21 Feb 2022
Deconstructing Distributions: A Pointwise Framework of Learning
Gal Kaplun
Nikhil Ghosh
Saurabh Garg
Boaz Barak
Preetum Nakkiran
OOD
33
21
0
20 Feb 2022
Y
\mathcal{Y}
Y
-Tuning: An Efficient Tuning Paradigm for Large-Scale Pre-Trained Models via Label Representation Learning
Yitao Liu
Chen An
Xipeng Qiu
29
17
0
20 Feb 2022
Visual Attention Network
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViT
VLM
24
638
0
20 Feb 2022
COMPASS: Contrastive Multimodal Pretraining for Autonomous Systems
Shuang Ma
Sai H. Vemprala
Wenshan Wang
Jayesh K. Gupta
Yale Song
Daniel J. McDuff
Ashish Kapoor
SSL
37
9
0
20 Feb 2022
Do Transformers know symbolic rules, and would we know if they did?
Tommi Gröndahl
Yu-Wen Guo
Nirmal Asokan
30
0
0
19 Feb 2022
TransDreamer: Reinforcement Learning with Transformer World Models
Changgu Chen
Yi-Fu Wu
Jaesik Yoon
Sungjin Ahn
OffRL
32
91
0
19 Feb 2022
Synthetic Disinformation Attacks on Automated Fact Verification Systems
Y. Du
Antoine Bosselut
Christopher D. Manning
AAML
OffRL
36
32
0
18 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
331
0
18 Feb 2022
A Survey of Vision-Language Pre-Trained Models
Yifan Du
Zikang Liu
Junyi Li
Wayne Xin Zhao
VLM
42
180
0
18 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
24
183
0
17 Feb 2022
cosFormer: Rethinking Softmax in Attention
Zhen Qin
Weixuan Sun
Huicai Deng
Dongxu Li
Yunshen Wei
Baohong Lv
Junjie Yan
Lingpeng Kong
Yiran Zhong
38
212
0
17 Feb 2022
Revisiting Over-smoothing in BERT from the Perspective of Graph
Han Shi
Jiahui Gao
Hang Xu
Xiaodan Liang
Zhenguo Li
Lingpeng Kong
Stephen M. S. Lee
James T. Kwok
24
71
0
17 Feb 2022
Open-Ended Reinforcement Learning with Neural Reward Functions
Robert Meier
Asier Mujika
37
7
0
16 Feb 2022
Revisiting Parameter-Efficient Tuning: Are We Really There Yet?
Guanzheng Chen
Fangyu Liu
Zaiqiao Meng
Shangsong Liang
26
88
0
16 Feb 2022
EdgeFormer: A Parameter-Efficient Transformer for On-Device Seq2seq Generation
Tao Ge
Si-Qing Chen
Furu Wei
MoE
32
20
0
16 Feb 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
47
212
0
16 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
43
65
0
15 Feb 2022
Better Together? An Evaluation of AI-Supported Code Translation
Justin D. Weisz
Michael J. Muller
Steven I. Ross
Fernando Martinez
Stephanie Houde
Mayank Agarwal
Kartik Talamadupula
John T. Richards
29
67
0
15 Feb 2022
P4E: Few-Shot Event Detection as Prompt-Guided Identification and Localization
Sha Li
Liyuan Liu
Yiqing Xie
Heng Ji
Jiawei Han
41
4
0
15 Feb 2022
Impact of Pretraining Term Frequencies on Few-Shot Reasoning
Yasaman Razeghi
Robert L Logan IV
Matt Gardner
Sameer Singh
ReLM
LRM
32
150
0
15 Feb 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
43
88
0
14 Feb 2022
Deduplicating Training Data Mitigates Privacy Risks in Language Models
Nikhil Kandpal
Eric Wallace
Colin Raffel
PILM
MU
54
275
0
14 Feb 2022
Transformer-based Approaches for Legal Text Processing
Nguyen Ha Thanh
Phuong Minh Nguyen
Thi-Hai-Yen Vuong
Minh Q. Bui
Minh-Chau Nguyen
Binh Dang
Vu Tran
Le-Minh Nguyen
Kenji Satoh
AILaw
24
12
0
13 Feb 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
19
17
0
13 Feb 2022
Flowformer: Linearizing Transformers with Conservation Flows
Haixu Wu
Jialong Wu
Jiehui Xu
Jianmin Wang
Mingsheng Long
14
90
0
13 Feb 2022
Semantic-Oriented Unlabeled Priming for Large-Scale Language Models
Yanchen Liu
Timo Schick
Hinrich Schütze
VLM
33
15
0
12 Feb 2022
Maximizing Communication Efficiency for Large-scale Training via 0/1 Adam
Yucheng Lu
Conglong Li
Minjia Zhang
Christopher De Sa
Yuxiong He
OffRL
AI4CE
26
20
0
12 Feb 2022
Previous
1
2
3
...
205
206
207
...
228
229
230
Next