ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14165
  4. Cited By
Language Models are Few-Shot Learners
v1v2v3v4 (latest)

Language Models are Few-Shot Learners

28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
    BDL
ArXiv (abs)PDFHTML

Papers citing "Language Models are Few-Shot Learners"

50 / 12,200 papers shown
Title
CoCon: A Self-Supervised Approach for Controlled Text Generation
CoCon: A Self-Supervised Approach for Controlled Text Generation
Alvin Chan
Yew-Soon Ong
B. Pung
Aston Zhang
Jie Fu
79
86
0
05 Jun 2020
MLE-guided parameter search for task loss minimization in neural
  sequence modeling
MLE-guided parameter search for task loss minimization in neural sequence modeling
Sean Welleck
Kyunghyun Cho
59
10
0
04 Jun 2020
Serving DNNs like Clockwork: Performance Predictability from the Bottom
  Up
Serving DNNs like Clockwork: Performance Predictability from the Bottom Up
A. Gujarati
Reza Karimi
Safya Alzayat
Wei Hao
Antoine Kaufmann
Ymir Vigfusson
Jonathan Mace
109
285
0
03 Jun 2020
A Survey on Transfer Learning in Natural Language Processing
A Survey on Transfer Learning in Natural Language Processing
Zaid Alyafeai
Maged S. Alshaibani
Irfan Ahmad
91
75
0
31 May 2020
Transferring Inductive Biases through Knowledge Distillation
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
90
60
0
31 May 2020
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in
  Dialogue Systems
Predict-then-Decide: A Predictive Approach for Wait or Answer Task in Dialogue Systems
Zehao Lin
Shaobo Cui
Guodun Li
Xiaoming Kang
Feng Ji
Feng-Lin Li
Zhongzhou Zhao
Haiqing Chen
Yin Zhang
60
2
0
27 May 2020
Med-BERT: pre-trained contextualized embeddings on large-scale
  structured electronic health records for disease prediction
Med-BERT: pre-trained contextualized embeddings on large-scale structured electronic health records for disease prediction
L. Rasmy
Yang Xiang
Z. Xie
Cui Tao
Degui Zhi
AI4MHLM&MA
101
698
0
22 May 2020
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Movement Pruning: Adaptive Sparsity by Fine-Tuning
Victor Sanh
Thomas Wolf
Alexander M. Rush
79
487
0
15 May 2020
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora
  Extraction
Tailoring and Evaluating the Wikipedia for in-Domain Comparable Corpora Extraction
C. España-Bonet
Alberto Barrón-Cedeño
Lluís Marquez
22
9
0
03 May 2020
Reinforcement Learning with Augmented Data
Reinforcement Learning with Augmented Data
Michael Laskin
Kimin Lee
Adam Stooke
Lerrel Pinto
Pieter Abbeel
A. Srinivas
OffRL
124
660
0
30 Apr 2020
Explainable Deep Learning: A Field Guide for the Uninitiated
Explainable Deep Learning: A Field Guide for the Uninitiated
Gabrielle Ras
Ning Xie
Marcel van Gerven
Derek Doran
AAMLXAI
111
379
0
30 Apr 2020
Deep Learning for Time Series Forecasting: Tutorial and Literature
  Survey
Deep Learning for Time Series Forecasting: Tutorial and Literature Survey
Konstantinos Benidis
Syama Sundar Rangapuram
Valentin Flunkert
Bernie Wang
Danielle C. Maddix
...
David Salinas
Lorenzo Stella
François-Xavier Aubet
Laurent Callot
Tim Januschowski
AI4TS
99
200
0
21 Apr 2020
Experience Grounds Language
Experience Grounds Language
Yonatan Bisk
Ari Holtzman
Jesse Thomason
Jacob Andreas
Yoshua Bengio
...
Angeliki Lazaridou
Jonathan May
Aleksandr Nisnevich
Nicolas Pinto
Joseph P. Turian
102
360
0
21 Apr 2020
Improving Readability for Automatic Speech Recognition Transcription
Improving Readability for Automatic Speech Recognition Transcription
Junwei Liao
Sefik Emre Eskimez
Liyang Lu
Yu Shi
Ming Gong
Linjun Shou
Hong Qu
Michael Zeng
67
56
0
09 Apr 2020
Self-Induced Curriculum Learning in Self-Supervised Neural Machine
  Translation
Self-Induced Curriculum Learning in Self-Supervised Neural Machine Translation
Dana Ruiter
Josef van Genabith
C. España-Bonet
SSL
51
3
0
07 Apr 2020
Deep Learning Based Text Classification: A Comprehensive Review
Deep Learning Based Text Classification: A Comprehensive Review
Shervin Minaee
Nal Kalchbrenner
Min Zhang
Narjes Nikzad
M. Asgari-Chenaghlu
Jianfeng Gao
AILawVLMAI4TS
116
1,113
0
06 Apr 2020
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Optimus: Organizing Sentences via Pre-trained Modeling of a Latent Space
Chunyuan Li
Xiang Gao
Yuan Li
Baolin Peng
Xiujun Li
Yizhe Zhang
Jianfeng Gao
SSLDRL
86
182
0
05 Apr 2020
A Low-cost Fault Corrector for Deep Neural Networks through Range
  Restriction
A Low-cost Fault Corrector for Deep Neural Networks through Range Restriction
Zitao Chen
Guanpeng Li
Karthik Pattabiraman
AAMLAI4CE
87
17
0
30 Mar 2020
Machine learning as a model for cultural learning: Teaching an algorithm
  what it means to be fat
Machine learning as a model for cultural learning: Teaching an algorithm what it means to be fat
Alina Arseniev-Koehler
J. Foster
74
49
0
24 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MAVLM
383
1,495
0
18 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
89
282
0
10 Mar 2020
Teaching Temporal Logics to Neural Networks
Teaching Temporal Logics to Neural Networks
Christopher Hahn
Frederik Schmitt
Jens U. Kreber
M. Rabe
Bernd Finkbeiner
NAI
109
67
0
06 Mar 2020
Iterative Averaging in the Quest for Best Test Error
Iterative Averaging in the Quest for Best Test Error
Diego Granziol
Xingchen Wan
Samuel Albanie
Stephen J. Roberts
66
3
0
02 Mar 2020
Loss landscapes and optimization in over-parameterized non-linear
  systems and neural networks
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
Chaoyue Liu
Libin Zhu
M. Belkin
ODL
98
266
0
29 Feb 2020
On Biased Compression for Distributed Learning
On Biased Compression for Distributed Learning
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
75
189
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
125
1,507
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
127
201
0
27 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
82
52
0
10 Feb 2020
Language Models Are An Effective Patient Representation Learning
  Technique For Electronic Health Record Data
Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data
E. Steinberg
Kenneth Jung
Jason Alan Fries
Conor K. Corbin
Stephen Pfohl
N. Shah
90
110
0
06 Jan 2020
Fast and energy-efficient neuromorphic deep learning with first-spike
  times
Fast and energy-efficient neuromorphic deep learning with first-spike times
Julian Goltz
Laura Kriener
A. Baumbach
Sebastian Billaudelle
O. Breitwieser
...
Á. F. Kungl
Walter Senn
Johannes Schemmel
K. Meier
Mihai A. Petrovici
137
132
0
24 Dec 2019
Extending Machine Language Models toward Human-Level Language
  Understanding
Extending Machine Language Models toward Human-Level Language Understanding
James L. McClelland
Felix Hill
Maja R. Rudolph
Jason Baldridge
Hinrich Schütze
LRM
78
35
0
12 Dec 2019
Attentive Representation Learning with Adversarial Training for Short
  Text Clustering
Attentive Representation Learning with Adversarial Training for Short Text Clustering
Wei Zhang
Chao Dong
Jianhua Yin
Jianyong Wang
60
13
0
08 Dec 2019
Blockwise Self-Attention for Long Document Understanding
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
109
254
0
07 Nov 2019
Discovering the Compositional Structure of Vector Representations with
  Role Learning Networks
Discovering the Compositional Structure of Vector Representations with Role Learning Networks
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
127
44
0
21 Oct 2019
Demon: Improved Neural Network Training with Momentum Decay
Demon: Improved Neural Network Training with Momentum Decay
John Chen
Cameron R. Wolfe
Zhaoqi Li
Anastasios Kyrillidis
ODL
106
15
0
11 Oct 2019
On the adequacy of untuned warmup for adaptive optimization
On the adequacy of untuned warmup for adaptive optimization
Jerry Ma
Denis Yarats
106
70
0
09 Oct 2019
DeepONet: Learning nonlinear operators for identifying differential
  equations based on the universal approximation theorem of operators
DeepONet: Learning nonlinear operators for identifying differential equations based on the universal approximation theorem of operators
Lu Lu
Pengzhan Jin
George Karniadakis
248
2,171
0
08 Oct 2019
Soft-Label Dataset Distillation and Text Dataset Distillation
Soft-Label Dataset Distillation and Text Dataset Distillation
Ilia Sucholutsky
Matthias Schonlau
DD
141
135
0
06 Oct 2019
Distributed Learning of Deep Neural Networks using Independent Subnet
  Training
Distributed Learning of Deep Neural Networks using Independent Subnet Training
John Shelton Hyatt
Cameron R. Wolfe
Michael Lee
Yuxin Tang
Anastasios Kyrillidis
Christopher M. Jermaine
OOD
92
39
0
04 Oct 2019
Semi-supervised Thai Sentence Segmentation Using Local and Distant Word
  Representations
Semi-supervised Thai Sentence Segmentation Using Local and Distant Word Representations
Chanatip Saetia
Ekapol Chuangsuwanich
Tawunrat Chalothorn
P. Vateekul
72
5
0
04 Aug 2019
Trends in Integration of Vision and Language Research: A Survey of
  Tasks, Datasets, and Methods
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods
Aditya Mogadala
M. Kalimuthu
Dietrich Klakow
VLM
141
136
0
22 Jul 2019
Norms for Beneficial A.I.: A Computational Analysis of the Societal
  Value Alignment Problem
Norms for Beneficial A.I.: A Computational Analysis of the Societal Value Alignment Problem
Pedro M. Fernandes
Francisco C. Santos
Manuel Lopes
35
11
0
26 Jun 2019
Exposure Bias versus Self-Recovery: Are Distortions Really Incremental
  for Autoregressive Text Generation?
Exposure Bias versus Self-Recovery: Are Distortions Really Incremental for Autoregressive Text Generation?
Tianxing He
Jingzhao Zhang
Zhiming Zhou
James R. Glass
99
32
0
25 May 2019
An Information Theoretic Interpretation to Deep Neural Networks
An Information Theoretic Interpretation to Deep Neural Networks
Shao-Lun Huang
Xiangxiang Xu
Lizhong Zheng
G. Wornell
FAtt
90
44
0
16 May 2019
An Attentive Survey of Attention Models
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
192
664
0
05 Apr 2019
A Brain-inspired Algorithm for Training Highly Sparse Neural Networks
A Brain-inspired Algorithm for Training Highly Sparse Neural Networks
Zahra Atashgahi
Joost Pieterse
Shiwei Liu
Decebal Constantin Mocanu
Raymond N. J. Veldhuis
Mykola Pechenizkiy
74
15
0
17 Mar 2019
Investigating Antigram Behaviour using Distributional Semantics
Investigating Antigram Behaviour using Distributional Semantics
Saptarshi Sengupta
31
0
0
15 Jan 2019
Automated Machine Learning: From Principles to Practices
Automated Machine Learning: From Principles to Practices
Quanming Yao
Mengshuo Wang
Hugo Jair Escalante
Huan Zhao
Qiang Yang
105
259
0
31 Oct 2018
Deep Learning for Genomics: A Concise Overview
Deep Learning for Genomics: A Concise Overview
Tianwei Yue
Yuanxin Wang
Longxiang Zhang
Chunming Gu
Haohan Wang
Wenping Wang
Qi Lyu
Yujie Dun
AILawVLMBDL
86
91
0
02 Feb 2018
Quantifying the probable approximation error of probabilistic inference
  programs
Quantifying the probable approximation error of probabilistic inference programs
Marco F. Cusumano-Towner
Vikash K. Mansinghka
100
5
0
31 May 2016
Previous
123...242243244