ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14165
  4. Cited By
Language Models are Few-Shot Learners

Language Models are Few-Shot Learners

28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
    BDL
ArXivPDFHTML

Papers citing "Language Models are Few-Shot Learners"

50 / 11,568 papers shown
Title
GLM: General Language Model Pretraining with Autoregressive Blank
  Infilling
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDL
AI4CE
51
1,496
0
18 Mar 2021
Towards Few-Shot Fact-Checking via Perplexity
Towards Few-Shot Fact-Checking via Perplexity
Nayeon Lee
Yejin Bang
Andrea Madotto
Madian Khabsa
Pascale Fung
AAML
13
90
0
17 Mar 2021
How Many Data Points is a Prompt Worth?
How Many Data Points is a Prompt Worth?
Teven Le Scao
Alexander M. Rush
VLM
66
296
0
15 Mar 2021
A Whole Brain Probabilistic Generative Model: Toward Realizing Cognitive
  Architectures for Developmental Robots
A Whole Brain Probabilistic Generative Model: Toward Realizing Cognitive Architectures for Developmental Robots
T. Taniguchi
Hiroshi Yamakawa
Takayuki Nagai
Kenji Doya
M. Sakagami
Masahiro Suzuki
Tomoaki Nakamura
Akira Taniguchi
28
23
0
15 Mar 2021
Revisiting ResNets: Improved Training and Scaling Strategies
Revisiting ResNets: Improved Training and Scaling Strategies
Irwan Bello
W. Fedus
Xianzhi Du
E. D. Cubuk
A. Srinivas
Nayeon Lee
Jonathon Shlens
Barret Zoph
31
298
0
13 Mar 2021
Inductive Relation Prediction by BERT
Inductive Relation Prediction by BERT
H. Zha
Zhiyu Zoey Chen
Xifeng Yan
29
54
0
12 Mar 2021
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language
  Representation
CANINE: Pre-training an Efficient Tokenization-Free Encoder for Language Representation
J. Clark
Dan Garrette
Iulia Turc
John Wieting
38
210
0
11 Mar 2021
Integration of Convolutional Neural Networks in Mobile Applications
Integration of Convolutional Neural Networks in Mobile Applications
Roger Creus Castanyer
Silverio Martínez-Fernández
Xavier Franch
29
12
0
11 Mar 2021
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio
  Representation
BYOL for Audio: Self-Supervised Learning for General-Purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
38
175
0
11 Mar 2021
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
CUAD: An Expert-Annotated NLP Dataset for Legal Contract Review
Dan Hendrycks
Collin Burns
Anya Chen
Spencer Ball
ELM
AILaw
23
185
0
10 Mar 2021
Pretrained Transformers as Universal Computation Engines
Pretrained Transformers as Universal Computation Engines
Kevin Lu
Aditya Grover
Pieter Abbeel
Igor Mordatch
28
218
0
09 Mar 2021
Knowledge Evolution in Neural Networks
Knowledge Evolution in Neural Networks
Ahmed Taha
Abhinav Shrivastava
L. Davis
49
21
0
09 Mar 2021
Deep Generative Modelling: A Comparative Review of VAEs, GANs,
  Normalizing Flows, Energy-Based and Autoregressive Models
Deep Generative Modelling: A Comparative Review of VAEs, GANs, Normalizing Flows, Energy-Based and Autoregressive Models
Sam Bond-Taylor
Adam Leach
Yang Long
Chris G. Willcocks
VLM
TPM
45
485
0
08 Mar 2021
Large Pre-trained Language Models Contain Human-like Biases of What is
  Right and Wrong to Do
Large Pre-trained Language Models Contain Human-like Biases of What is Right and Wrong to Do
P. Schramowski
Cigdem Turan
Nico Andersen
Constantin Rothkopf
Kristian Kersting
33
281
0
08 Mar 2021
Behavior From the Void: Unsupervised Active Pre-Training
Behavior From the Void: Unsupervised Active Pre-Training
Hao Liu
Pieter Abbeel
VLM
SSL
43
195
0
08 Mar 2021
Greedy Hierarchical Variational Autoencoders for Large-Scale Video
  Prediction
Greedy Hierarchical Variational Autoencoders for Large-Scale Video Prediction
Bohan Wu
Suraj Nair
Roberto Martin-Martin
Li Fei-Fei
Chelsea Finn
DRL
27
99
0
06 Mar 2021
Measuring Mathematical Problem Solving With the MATH Dataset
Measuring Mathematical Problem Solving With the MATH Dataset
Dan Hendrycks
Collin Burns
Saurav Kadavath
Akul Arora
Steven Basart
Eric Tang
D. Song
Jacob Steinhardt
ReLM
FaML
84
1,911
0
05 Mar 2021
OperA: Attention-Regularized Transformers for Surgical Phase Recognition
OperA: Attention-Regularized Transformers for Surgical Phase Recognition
Tobias Czempiel
Magdalini Paschali
D. Ostler
S. T. Kim
Benjamin Busam
Nassir Navab
MedIm
44
86
0
05 Mar 2021
Generating Images with Sparse Representations
Generating Images with Sparse Representations
C. Nash
Jacob Menick
Sander Dieleman
Peter W. Battaglia
33
201
0
05 Mar 2021
Training a First-Order Theorem Prover from Synthetic Data
Training a First-Order Theorem Prover from Synthetic Data
Vlad Firoiu
Eser Aygun
Ankit Anand
Zafarali Ahmed
Xavier Glorot
Laurent Orseau
Lei Zhang
Doina Precup
Shibl Mourad
NAI
21
13
0
05 Mar 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly
  Exponentially with Depth
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
52
373
0
05 Mar 2021
Moshpit SGD: Communication-Efficient Decentralized Training on
  Heterogeneous Unreliable Devices
Moshpit SGD: Communication-Efficient Decentralized Training on Heterogeneous Unreliable Devices
Max Ryabinin
Eduard A. Gorbunov
Vsevolod Plokhotnyuk
Gennady Pekhimenko
37
33
0
04 Mar 2021
OAG-BERT: Towards A Unified Backbone Language Model For Academic
  Knowledge Services
OAG-BERT: Towards A Unified Backbone Language Model For Academic Knowledge Services
Xiao Liu
Da Yin
Jingnan Zheng
Xingjian Zhang
Peng Zhang
Hongxia Yang
Yuxiao Dong
Jie Tang
VLM
45
31
0
03 Mar 2021
Random Feature Attention
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
36
349
0
03 Mar 2021
Disentangling Syntax and Semantics in the Brain with Deep Networks
Disentangling Syntax and Semantics in the Brain with Deep Networks
Charlotte Caucheteux
Alexandre Gramfort
J. King
36
70
0
02 Mar 2021
Generalizing to Unseen Domains: A Survey on Domain Generalization
Generalizing to Unseen Domains: A Survey on Domain Generalization
Jindong Wang
Cuiling Lan
Chang-Shu Liu
Yidong Ouyang
Tao Qin
Wang Lu
Yiqiang Chen
Wenjun Zeng
Philip S. Yu
OOD
56
1,177
0
02 Mar 2021
M6: A Chinese Multimodal Pretrainer
M6: A Chinese Multimodal Pretrainer
Junyang Lin
Rui Men
An Yang
Chan Zhou
Ming Ding
...
Yong Li
Wei Lin
Jingren Zhou
J. Tang
Hongxia Yang
VLM
MoE
37
133
0
01 Mar 2021
Query Rewriting via Cycle-Consistent Translation for E-Commerce Search
Query Rewriting via Cycle-Consistent Translation for E-Commerce Search
Yiming Qiu
Kang Zhang
Han Zhang
Songlin Wang
Sulong Xu
Yun Xiao
Bo Long
Wen-Yun Yang
54
16
0
01 Mar 2021
On the Utility of Gradient Compression in Distributed Training Systems
On the Utility of Gradient Compression in Distributed Training Systems
Saurabh Agarwal
Hongyi Wang
Shivaram Venkataraman
Dimitris Papailiopoulos
38
46
0
28 Feb 2021
Transformer in Transformer
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
319
1,528
0
27 Feb 2021
Learning Transferable Visual Models From Natural Language Supervision
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
188
27,929
0
26 Feb 2021
Investigating the Limitations of Transformers with Simple Arithmetic
  Tasks
Investigating the Limitations of Transformers with Simple Arithmetic Tasks
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Li
LRM
24
123
0
25 Feb 2021
Self-Tuning for Data-Efficient Deep Learning
Self-Tuning for Data-Efficient Deep Learning
Ximei Wang
Jing Gao
Mingsheng Long
Jianmin Wang
BDL
30
70
0
25 Feb 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
23
54
0
25 Feb 2021
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Victor Campos
Pablo Sprechmann
Steven Hansen
André Barreto
Steven Kapturowski
Alex Vitvitskyi
Adria Puigdomenech Badia
Charles Blundell
OffRL
OnRL
41
25
0
24 Feb 2021
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen
  Domains
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains
Eyal Ben-David
Nadav Oved
Roi Reichart
VLM
OOD
17
88
0
24 Feb 2021
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
  Language Models
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
Harold Ott
Jasmin Bogatinovski
Alexander Acker
S. Nedelkoski
O. Kao
11
29
0
23 Feb 2021
Position Information in Transformers: An Overview
Position Information in Transformers: An Overview
Philipp Dufter
Martin Schmitt
Hinrich Schütze
18
141
0
22 Feb 2021
Revisiting Classification Perspective on Scene Text Recognition
Revisiting Classification Perspective on Scene Text Recognition
Hongxiang Cai
Jun Sun
Yichao Xiong
24
10
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
296
0
22 Feb 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
54
9
0
20 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
60
3,549
0
18 Feb 2021
Meta-Transfer Learning for Low-Resource Abstractive Summarization
Meta-Transfer Learning for Low-Resource Abstractive Summarization
Yi-Syuan Chen
Hong-Han Shuai
CLL
OffRL
48
38
0
18 Feb 2021
Training Large-Scale News Recommenders with Pretrained Language Models
  in the Loop
Training Large-Scale News Recommenders with Pretrained Language Models in the Loop
Shitao Xiao
Zheng Liu
Yingxia Shao
Tao Di
Xing Xie
VLM
AIFin
127
41
0
18 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
302
1,086
0
17 Feb 2021
Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
  Dataset Release
Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release
Liam H. Fowl
Ping Yeh-Chiang
Micah Goldblum
Jonas Geiping
Arpit Bansal
W. Czaja
Tom Goldstein
24
43
0
16 Feb 2021
Accelerated Sparse Neural Training: A Provable and Efficient Method to
  Find N:M Transposable Masks
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
59
111
0
16 Feb 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Yifan Jiang
Tom Goldstein
ODL
41
54
0
16 Feb 2021
Exploring Transformers in Natural Language Generation: GPT, BERT, and
  XLNet
Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet
M. O. Topal
Anil Bas
Imke van Heerden
LLMAG
AI4CE
26
88
0
16 Feb 2021
Training Larger Networks for Deep Reinforcement Learning
Training Larger Networks for Deep Reinforcement Learning
Keita Ota
Devesh K. Jha
Asako Kanezaki
OffRL
37
39
0
16 Feb 2021
Previous
123...225226227...230231232
Next