ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14165
  4. Cited By
Language Models are Few-Shot Learners

Language Models are Few-Shot Learners

28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
    BDL
ArXivPDFHTML

Papers citing "Language Models are Few-Shot Learners"

50 / 11,588 papers shown
Title
Investigating the Limitations of Transformers with Simple Arithmetic
  Tasks
Investigating the Limitations of Transformers with Simple Arithmetic Tasks
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Li
LRM
24
123
0
25 Feb 2021
Self-Tuning for Data-Efficient Deep Learning
Self-Tuning for Data-Efficient Deep Learning
Ximei Wang
Jing Gao
Mingsheng Long
Jianmin Wang
BDL
30
70
0
25 Feb 2021
SparseBERT: Rethinking the Importance Analysis in Self-attention
SparseBERT: Rethinking the Importance Analysis in Self-attention
Han Shi
Jiahui Gao
Xiaozhe Ren
Hang Xu
Xiaodan Liang
Zhenguo Li
James T. Kwok
23
54
0
25 Feb 2021
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Beyond Fine-Tuning: Transferring Behavior in Reinforcement Learning
Victor Campos
Pablo Sprechmann
Steven Hansen
André Barreto
Steven Kapturowski
Alex Vitvitskyi
Adria Puigdomenech Badia
Charles Blundell
OffRL
OnRL
41
25
0
24 Feb 2021
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen
  Domains
PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains
Eyal Ben-David
Nadav Oved
Roi Reichart
VLM
OOD
17
88
0
24 Feb 2021
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained
  Language Models
Robust and Transferable Anomaly Detection in Log Data using Pre-Trained Language Models
Harold Ott
Jasmin Bogatinovski
Alexander Acker
S. Nedelkoski
O. Kao
11
29
0
23 Feb 2021
Position Information in Transformers: An Overview
Position Information in Transformers: An Overview
Philipp Dufter
Martin Schmitt
Hinrich Schütze
24
141
0
22 Feb 2021
Revisiting Classification Perspective on Scene Text Recognition
Revisiting Classification Perspective on Scene Text Recognition
Hongxiang Cai
Jun Sun
Yichao Xiong
24
10
0
22 Feb 2021
UniT: Multimodal Multitask Learning with a Unified Transformer
UniT: Multimodal Multitask Learning with a Unified Transformer
Ronghang Hu
Amanpreet Singh
ViT
25
296
0
22 Feb 2021
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
GIST: Distributed Training for Large-Scale Graph Convolutional Networks
Cameron R. Wolfe
Jingkang Yang
Arindam Chowdhury
Chen Dun
Artun Bayer
Santiago Segarra
Anastasios Kyrillidis
BDL
GNN
LRM
54
9
0
20 Feb 2021
Improved Denoising Diffusion Probabilistic Models
Improved Denoising Diffusion Probabilistic Models
Alex Nichol
Prafulla Dhariwal
DiffM
60
3,549
0
18 Feb 2021
Meta-Transfer Learning for Low-Resource Abstractive Summarization
Meta-Transfer Learning for Low-Resource Abstractive Summarization
Yi-Syuan Chen
Hong-Han Shuai
CLL
OffRL
48
38
0
18 Feb 2021
Training Large-Scale News Recommenders with Pretrained Language Models
  in the Loop
Training Large-Scale News Recommenders with Pretrained Language Models in the Loop
Shitao Xiao
Zheng Liu
Yingxia Shao
Tao Di
Xing Xie
VLM
AIFin
127
41
0
18 Feb 2021
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize
  Long-Tail Visual Concepts
Conceptual 12M: Pushing Web-Scale Image-Text Pre-Training To Recognize Long-Tail Visual Concepts
Soravit Changpinyo
P. Sharma
Nan Ding
Radu Soricut
VLM
302
1,086
0
17 Feb 2021
Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure
  Dataset Release
Preventing Unauthorized Use of Proprietary Data: Poisoning for Secure Dataset Release
Liam H. Fowl
Ping Yeh-Chiang
Micah Goldblum
Jonas Geiping
Arpit Bansal
W. Czaja
Tom Goldstein
24
43
0
16 Feb 2021
Accelerated Sparse Neural Training: A Provable and Efficient Method to
  Find N:M Transposable Masks
Accelerated Sparse Neural Training: A Provable and Efficient Method to Find N:M Transposable Masks
Itay Hubara
Brian Chmiel
Moshe Island
Ron Banner
S. Naor
Daniel Soudry
59
111
0
16 Feb 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
Yifan Jiang
Tom Goldstein
ODL
41
54
0
16 Feb 2021
Exploring Transformers in Natural Language Generation: GPT, BERT, and
  XLNet
Exploring Transformers in Natural Language Generation: GPT, BERT, and XLNet
M. O. Topal
Anil Bas
Imke van Heerden
LLMAG
AI4CE
26
88
0
16 Feb 2021
Training Larger Networks for Deep Reinforcement Learning
Training Larger Networks for Deep Reinforcement Learning
Keita Ota
Devesh K. Jha
Asako Kanezaki
OffRL
37
39
0
16 Feb 2021
Prompt Programming for Large Language Models: Beyond the Few-Shot
  Paradigm
Prompt Programming for Large Language Models: Beyond the Few-Shot Paradigm
Laria Reynolds
Kyle McDonell
58
849
0
15 Feb 2021
Understanding Negative Samples in Instance Discriminative
  Self-supervised Representation Learning
Understanding Negative Samples in Instance Discriminative Self-supervised Representation Learning
Kento Nozawa
Issei Sato
SSL
22
43
0
13 Feb 2021
Explaining Neural Scaling Laws
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
27
250
0
12 Feb 2021
Proof Artifact Co-training for Theorem Proving with Language Models
Proof Artifact Co-training for Theorem Proving with Language Models
Jesse Michael Han
Jason M. Rute
Yuhuai Wu
Edward W. Ayers
Stanislas Polu
AIMat
27
121
0
11 Feb 2021
Cross-Domain Multi-Task Learning for Sequential Sentence Classification
  in Research Papers
Cross-Domain Multi-Task Learning for Sequential Sentence Classification in Research Papers
Arthur Brack
Anett Hoppe
Pascal Buschermöhle
Ralph Ewerth
27
18
0
11 Feb 2021
NAST: Non-Autoregressive Spatial-Temporal Transformer for Time Series Forecasting
Kai Chen
Guang Chen
Dan Xu
Lijun Zhang
Yuyao Huang
Alois C. Knoll
AI4TS
22
21
0
10 Feb 2021
A Framework for Auditing Data Center Energy Usage and Mitigating
  Environmental Footprint
A Framework for Auditing Data Center Energy Usage and Mitigating Environmental Footprint
Justin Gould
17
1
0
08 Feb 2021
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
Learning N:M Fine-grained Structured Sparse Neural Networks From Scratch
Aojun Zhou
Yukun Ma
Junnan Zhu
Jianbo Liu
Zhijie Zhang
Kun Yuan
Wenxiu Sun
Hongsheng Li
69
240
0
08 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Joey Tianyi Zhou
MLLM
277
525
0
04 Feb 2021
Embodied Intelligence via Learning and Evolution
Embodied Intelligence via Learning and Evolution
Agrim Gupta
Silvio Savarese
Surya Ganguli
Li Fei-Fei
AI4CE
22
232
0
03 Feb 2021
When Can Models Learn From Explanations? A Formal Framework for
  Understanding the Roles of Explanation Data
When Can Models Learn From Explanations? A Formal Framework for Understanding the Roles of Explanation Data
Peter Hase
Joey Tianyi Zhou
XAI
25
87
0
03 Feb 2021
Mind the Gap: Assessing Temporal Generalization in Neural Language
  Models
Mind the Gap: Assessing Temporal Generalization in Neural Language Models
Angeliki Lazaridou
A. Kuncoro
E. Gribovskaya
Devang Agrawal
Adam Liska
...
Sebastian Ruder
Dani Yogatama
Kris Cao
Susannah Young
Phil Blunsom
VLM
41
207
0
03 Feb 2021
MAUVE: Measuring the Gap Between Neural Text and Human Text using
  Divergence Frontiers
MAUVE: Measuring the Gap Between Neural Text and Human Text using Divergence Frontiers
Krishna Pillutla
Swabha Swayamdipta
Rowan Zellers
John Thickstun
Sean Welleck
Yejin Choi
Zaïd Harchaoui
45
343
0
02 Feb 2021
Measuring and Improving Consistency in Pretrained Language Models
Measuring and Improving Consistency in Pretrained Language Models
Yanai Elazar
Nora Kassner
Shauli Ravfogel
Abhilasha Ravichander
Eduard H. Hovy
Hinrich Schütze
Yoav Goldberg
HILM
272
347
0
01 Feb 2021
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained
  Language Models
Multilingual LAMA: Investigating Knowledge in Multilingual Pretrained Language Models
Nora Kassner
Philipp Dufter
Hinrich Schütze
26
134
0
01 Feb 2021
Decoupling the Role of Data, Attention, and Losses in Multimodal
  Transformers
Decoupling the Role of Data, Attention, and Losses in Multimodal Transformers
Lisa Anne Hendricks
John F. J. Mellor
R. Schneider
Jean-Baptiste Alayrac
Aida Nematzadeh
79
110
0
31 Jan 2021
Can We Automate Scientific Reviewing?
Can We Automate Scientific Reviewing?
Weizhe Yuan
Pengfei Liu
Graham Neubig
85
84
0
30 Jan 2021
BENDR: using transformers and a contrastive self-supervised learning
  task to learn from massive amounts of EEG data
BENDR: using transformers and a contrastive self-supervised learning task to learn from massive amounts of EEG data
Demetres Kostas
Stephane Aroca-Ouellette
Frank Rudzicz
SSL
54
203
0
28 Jan 2021
Bottleneck Transformers for Visual Recognition
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
981
0
27 Jan 2021
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
TT-Rec: Tensor Train Compression for Deep Learning Recommendation Models
Chunxing Yin
Bilge Acun
Xing Liu
Carole-Jean Wu
50
103
0
25 Jan 2021
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Pruning and Quantization for Deep Neural Network Acceleration: A Survey
Tailin Liang
C. Glossner
Lei Wang
Shaobo Shi
Xiaotong Zhang
MQ
150
676
0
24 Jan 2021
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Word Alignment by Fine-tuning Embeddings on Parallel Corpora
Zi-Yi Dou
Graham Neubig
98
258
0
20 Jan 2021
Towards Facilitating Empathic Conversations in Online Mental Health
  Support: A Reinforcement Learning Approach
Towards Facilitating Empathic Conversations in Online Mental Health Support: A Reinforcement Learning Approach
Ashish Sharma
Inna Wanyin Lin
Adam S. Miner
David C. Atkins
Tim Althoff
AI4MH
25
140
0
19 Jan 2021
Diagnostic Captioning: A Survey
Diagnostic Captioning: A Survey
John Pavlopoulos
Vasiliki Kougia
Ion Androutsopoulos
D. Papamichail
3DV
MedIm
91
26
0
18 Jan 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
417
0
18 Jan 2021
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
LIME: Learning Inductive Bias for Primitives of Mathematical Reasoning
Yuhuai Wu
M. Rabe
Wenda Li
Jimmy Ba
Roger C. Grosse
Christian Szegedy
AIMat
LRM
75
53
0
15 Jan 2021
Counterfactual Generative Networks
Counterfactual Generative Networks
Axel Sauer
Andreas Geiger
OOD
BDL
CML
43
124
0
15 Jan 2021
Persistent Anti-Muslim Bias in Large Language Models
Persistent Anti-Muslim Bias in Large Language Models
Abubakar Abid
Maheen Farooqi
James Zou
AILaw
22
538
0
14 Jan 2021
GAN Inversion: A Survey
GAN Inversion: A Survey
Weihao Xia
Yulun Zhang
Yujiu Yang
Jing-Hao Xue
Bolei Zhou
Ming-Hsuan Yang
DiffM
70
507
0
14 Jan 2021
Of Non-Linearity and Commutativity in BERT
Of Non-Linearity and Commutativity in BERT
Sumu Zhao
Damian Pascual
Gino Brunner
Roger Wattenhofer
36
16
0
12 Jan 2021
A Convergence Theory Towards Practical Over-parameterized Deep Neural
  Networks
A Convergence Theory Towards Practical Over-parameterized Deep Neural Networks
Asaf Noy
Yi Tian Xu
Y. Aflalo
Lihi Zelnik-Manor
Rong Jin
41
3
0
12 Jan 2021
Previous
123...226227228...230231232
Next