ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14165
  4. Cited By
Language Models are Few-Shot Learners

Language Models are Few-Shot Learners

28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
    BDL
ArXivPDFHTML

Papers citing "Language Models are Few-Shot Learners"

50 / 11,513 papers shown
Title
Facts as Experts: Adaptable and Interpretable Neural Memory over
  Symbolic Knowledge
Facts as Experts: Adaptable and Interpretable Neural Memory over Symbolic Knowledge
Pat Verga
Haitian Sun
Livio Baldini Soares
William W. Cohen
KELM
35
50
0
02 Jul 2020
Computing Conceptual Distances between Breast Cancer Screening
  Guidelines: An Implementation of a Near-Peer Epistemic Model of Medical
  Disagreement
Computing Conceptual Distances between Breast Cancer Screening Guidelines: An Implementation of a Near-Peer Epistemic Model of Medical Disagreement
Hossein Hematialam
Luciana D. Garbayo
Seethalakshmi Gopalakrishnan
Wlodek Zadrozny
11
1
0
01 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
Technical Report: Auxiliary Tuning and its Application to Conditional
  Text Generation
Technical Report: Auxiliary Tuning and its Application to Conditional Text Generation
Yoel Zeldes
Dan Padnos
Or Sharir
Barak Peleg
31
19
0
30 Jun 2020
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning
PLATO-2: Towards Building an Open-Domain Chatbot via Curriculum Learning
Siqi Bao
H. He
Fan Wang
Hua Wu
Haifeng Wang
Wenquan Wu
Zhen Guo
Zhibin Liu
Xinchao Xu
30
137
0
30 Jun 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic
  Sharding
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhehuai Chen
MoE
43
1,116
0
30 Jun 2020
Natural Backdoor Attack on Text Data
Natural Backdoor Attack on Text Data
Lichao Sun
SILM
19
39
0
29 Jun 2020
Answering Questions on COVID-19 in Real-Time
Answering Questions on COVID-19 in Real-Time
Jinhyuk Lee
Sean S. Yi
Minbyul Jeong
Mujeen Sung
Wonjin Yoon
Yonghwa Choi
Miyoung Ko
Jaewoo Kang
21
43
0
29 Jun 2020
Evaluation of Text Generation: A Survey
Evaluation of Text Generation: A Survey
Asli Celikyilmaz
Elizabeth Clark
Jianfeng Gao
ELM
LM&MA
19
377
0
26 Jun 2020
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and
  Architectures
Direct Feedback Alignment Scales to Modern Deep Learning Tasks and Architectures
Julien Launay
Iacopo Poli
Franccois Boniface
Florent Krzakala
41
63
0
23 Jun 2020
The Depth-to-Width Interplay in Self-Attention
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
30
45
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
35
2
0
21 Jun 2020
A Qualitative Evaluation of Language Models on Automatic
  Question-Answering for COVID-19
A Qualitative Evaluation of Language Models on Automatic Question-Answering for COVID-19
David Oniani
Yanshan Wang
24
32
0
19 Jun 2020
An adaptive stochastic gradient-free approach for high-dimensional
  blackbox optimization
An adaptive stochastic gradient-free approach for high-dimensional blackbox optimization
Anton Dereventsov
Clayton Webster
Joseph Daws
22
10
0
18 Jun 2020
When Does Preconditioning Help or Hurt Generalization?
When Does Preconditioning Help or Hurt Generalization?
S. Amari
Jimmy Ba
Roger C. Grosse
Xuechen Li
Atsushi Nitanda
Taiji Suzuki
Denny Wu
Ji Xu
36
32
0
18 Jun 2020
On the Predictability of Pruning Across Scales
On the Predictability of Pruning Across Scales
Jonathan S. Rosenfeld
Jonathan Frankle
Michael Carbin
Nir Shavit
28
37
0
18 Jun 2020
What Do Neural Networks Learn When Trained With Random Labels?
What Do Neural Networks Learn When Trained With Random Labels?
Hartmut Maennel
Ibrahim M. Alabdulmohsin
Ilya O. Tolstikhin
R. Baldock
Olivier Bousquet
Sylvain Gelly
Daniel Keysers
FedML
48
87
0
18 Jun 2020
Neural Anisotropy Directions
Neural Anisotropy Directions
Guillermo Ortiz-Jiménez
Apostolos Modas
Seyed-Mohsen Moosavi-Dezfooli
P. Frossard
34
16
0
17 Jun 2020
Dynamic Tensor Rematerialization
Dynamic Tensor Rematerialization
Marisa Kirisame
Steven Lyubomirsky
Altan Haan
Jennifer Brennan
Mike He
Jared Roesch
Tianqi Chen
Zachary Tatlock
29
93
0
17 Jun 2020
Memory-Efficient Pipeline-Parallel DNN Training
Memory-Efficient Pipeline-Parallel DNN Training
Deepak Narayanan
Amar Phanishayee
Kaiyu Shi
Xie Chen
Matei A. Zaharia
MoE
36
212
0
16 Jun 2020
Surrogate gradients for analog neuromorphic computing
Surrogate gradients for analog neuromorphic computing
Benjamin Cramer
Sebastian Billaudelle
Simeon Kanya
Aron Leibfried
Andreas Grubl
...
Korbinian Schreiber
Yannik Stradmann
Johannes Weis
Johannes Schemmel
Friedemann Zenke
24
107
0
12 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
433
0
11 Jun 2020
Linformer: Self-Attention with Linear Complexity
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
75
1,651
0
08 Jun 2020
The Lipschitz Constant of Self-Attention
The Lipschitz Constant of Self-Attention
Hyunjik Kim
George Papamakarios
A. Mnih
14
135
0
08 Jun 2020
An Overview of Neural Network Compression
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
64
2,626
0
05 Jun 2020
A Survey on Transfer Learning in Natural Language Processing
A Survey on Transfer Learning in Natural Language Processing
Zaid Alyafeai
Maged S. Alshaibani
Irfan Ahmad
30
72
0
31 May 2020
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in
  Dialogue Systems
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in Dialogue Systems
Zehao Lin
Shaobo Cui
Guodun Li
Xiaoming Kang
Feng Ji
Feng-Lin Li
Zhongzhou Zhao
Haiqing Chen
Yin Zhang
34
1
0
27 May 2020
Med-BERT: pre-trained contextualized embeddings on large-scale
  structured electronic health records for disease prediction
Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction
L. Rasmy
Yang Xiang
Z. Xie
Cui Tao
Degui Zhi
AI4MH
LM&MA
24
657
0
22 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
32
470
0
15 May 2020
How Can We Accelerate Progress Towards Human-like Linguistic
  Generalization?
How Can We Accelerate Progress Towards Human-like Linguistic Generalization?
Tal Linzen
220
190
0
03 May 2020
Reinforcement Learning with Augmented Data
Reinforcement Learning with Augmented Data
Michael Laskin
Kimin Lee
Adam Stooke
Lerrel Pinto
Pieter Abbeel
A. Srinivas
OffRL
20
647
0
30 Apr 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAML
XAI
43
371
0
30 Apr 2020
Deep Learning for Time Series Forecasting: Tutorial and Literature
  Survey
Deep Learning for Time Series Forecasting: Tutorial and Literature Survey
Konstantinos Benidis
Syama Sundar Rangapuram
Valentin Flunkert
Bernie Wang
Danielle C. Maddix
...
David Salinas
Lorenzo Stella
François-Xavier Aubet
Laurent Callot
Tim Januschowski
AI4TS
25
176
0
21 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
24
351
0
21 Apr 2020
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Chunyuan Li
Xiang Gao
Yuan Li
Baolin Peng
Xiujun Li
Yizhe Zhang
Jianfeng Gao
SSL
DRL
32
181
0
05 Apr 2020
A Low-cost Fault Corrector for Deep Neural Networks through Range
  Restriction
A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction
Zitao Chen
Guanpeng Li
Karthik Pattabiraman
AAML
AI4CE
28
17
0
30 Mar 2020
Machine learning as a model for cultural learning: Teaching an algorithm
  what it means to be fat
Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat
Alina Arseniev-Koehler
J. Foster
43
46
0
24 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
246
1,454
0
18 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
30
276
0
10 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
10
3
0
02 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear
  systems and neural networks
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
17
248
0
29 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,532
0
23 Jan 2020
Exploiting Cloze Questions for Few Shot Text Classification and Natural
  Language Inference
Exploiting Cloze Questions for Few Shot Text Classification and Natural Language Inference
Timo Schick
Hinrich Schütze
258
1,591
0
21 Jan 2020
Language Models Are An Effective Patient Representation Learning
  Technique For Electronic Health Record Data
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data
E. Steinberg
Kenneth Jung
Jason Alan Fries
Conor K. Corbin
Stephen R. Pfohl
N. Shah
29
103
0
06 Jan 2020
Fast and energy-efficient neuromorphic deep learning with first-spike
  times
Fast and energy-efficient neuromorphic deep learning with first-spike times
Julian Goltz
Laura Kriener
A. Baumbach
Sebastian Billaudelle
O. Breitwieser
...
Á. F. Kungl
Walter Senn
Johannes Schemmel
K. Meier
Mihai A. Petrovici
35
126
0
24 Dec 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
11
252
0
07 Nov 2019
Discovering the Compositional Structure of Vector Representations with
  Role Learning Networks
Discovering the Compositional Structure of Vector Representations with Role Learning Networks
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
29
43
0
21 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
24
15
0
11 Oct 2019
Previous
123...229230231
Next