ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14165
  4. Cited By
Language Models are Few-Shot Learners
v1v2v3v4 (latest)

Language Models are Few-Shot Learners

28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
    BDL
ArXiv (abs)PDFHTML

Papers citing "Language Models are Few-Shot Learners"

50 / 12,243 papers shown
Title
Learning to Recognize Dialect Features
Learning to Recognize Dialect Features
Dorottya Demszky
D. Sharma
J. Clark
Vinodkumar Prabhakaran
Jacob Eisenstein
216
39
0
23 Oct 2020
Long Document Ranking with Query-Directed Sparse Transformer
Long Document Ranking with Query-Directed Sparse Transformer
Jyun-Yu Jiang
Chenyan Xiong
Chia-Jung Lee
Wei Wang
71
25
0
23 Oct 2020
Robust Document Representations using Latent Topics and Metadata
Robust Document Representations using Latent Topics and Metadata
Natraj Raman
Armineh Nourbakhsh
Sameena Shah
Manuela Veloso
21
0
0
23 Oct 2020
On the Transformer Growth for Progressive BERT Training
On the Transformer Growth for Progressive BERT Training
Xiaotao Gu
Liyuan Liu
Hongkun Yu
Jing Li
Chong Chen
Jiawei Han
VLM
120
54
0
23 Oct 2020
An Analysis of LIME for Text Data
An Analysis of LIME for Text Data
Dina Mardaoui
Damien Garreau
FAtt
187
45
0
23 Oct 2020
Towards Zero-Shot Multilingual Synthetic Question and Answer Generation
  for Cross-Lingual Reading Comprehension
Towards Zero-Shot Multilingual Synthetic Question and Answer Generation for Cross-Lingual Reading Comprehension
Siamak Shakeri
Noah Constant
Mihir Kale
Linting Xue
SyDa
72
28
0
22 Oct 2020
The Turking Test: Can Language Models Understand Instructions?
The Turking Test: Can Language Models Understand Instructions?
Avia Efrat
Omer Levy
ELMLRM
114
96
0
22 Oct 2020
Language Models are Open Knowledge Graphs
Language Models are Open Knowledge Graphs
Chenguang Wang
Xiao Liu
Basel Alomair
SSLKELM
79
137
0
22 Oct 2020
Limitations of Autoregressive Models and Their Alternatives
Limitations of Autoregressive Models and Their Alternatives
Chu-cheng Lin
Aaron Jaech
Xin Li
Matthew R. Gormley
Jason Eisner
86
63
0
22 Oct 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
702
41,681
0
22 Oct 2020
AdapterDrop: On the Efficiency of Adapters in Transformers
AdapterDrop: On the Efficiency of Adapters in Transformers
Andreas Rucklé
Gregor Geigle
Max Glockner
Tilman Beck
Jonas Pfeiffer
Nils Reimers
Iryna Gurevych
125
267
0
22 Oct 2020
Incorporating Stylistic Lexical Preferences in Generative Language
  Models
Incorporating Stylistic Lexical Preferences in Generative Language Models
Hrituraj Singh
Gaurav Verma
Balaji Vasan Srinivasan
23
5
0
22 Oct 2020
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
MAM: Masked Acoustic Modeling for End-to-End Speech-to-Text Translation
Junkun Chen
Mingbo Ma
Renjie Zheng
Liang Huang
90
21
0
22 Oct 2020
Is Retriever Merely an Approximator of Reader?
Is Retriever Merely an Approximator of Reader?
Sohee Yang
Minjoon Seo
RALM
85
42
0
21 Oct 2020
Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
Exploring Sequence-to-Sequence Models for SPARQL Pattern Composition
Anand Panchbhai
Tommaso Soru
Edgard Marx
18
5
0
21 Oct 2020
Bootleg: Chasing the Tail with Self-Supervised Named Entity
  Disambiguation
Bootleg: Chasing the Tail with Self-Supervised Named Entity Disambiguation
Laurel J. Orr
Megan Leszczynski
Simran Arora
Sen Wu
Neel Guha
Xiao Ling
Christopher Ré
209
48
0
20 Oct 2020
Local Knowledge Powered Conversational Agents
Local Knowledge Powered Conversational Agents
Sashank Santhanam
Ming-Yu Liu
Raul Puri
Mohammad Shoeybi
M. Patwary
Bryan Catanzaro
93
4
0
20 Oct 2020
Neural Language Modeling for Contextualized Temporal Graph Generation
Neural Language Modeling for Contextualized Temporal Graph Generation
Aman Madaan
Yiming Yang
101
20
0
20 Oct 2020
Optimism in the Face of Adversity: Understanding and Improving Deep
  Learning through Adversarial Robustness
Optimism in the Face of Adversity: Understanding and Improving Deep Learning through Adversarial Robustness
Guillermo Ortiz-Jiménez
Apostolos Modas
Seyed-Mohsen Moosavi-Dezfooli
P. Frossard
AAML
121
48
0
19 Oct 2020
Consistency and Coherency Enhanced Story Generation
Consistency and Coherency Enhanced Story Generation
Wei Wang
Piji Li
Haitao Zheng
71
11
0
17 Oct 2020
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for
  Natural Language Understanding
CoDA: Contrast-enhanced and Diversity-promoting Data Augmentation for Natural Language Understanding
Yanru Qu
Dinghan Shen
Yelong Shen
Sandra Sajeev
Jiawei Han
Weizhu Chen
204
69
0
16 Oct 2020
Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation
  System with Non-Stationary Data
Adaptive Dense-to-Sparse Paradigm for Pruning Online Recommendation System with Non-Stationary Data
Mao Ye
Dhruv Choudhary
Jiecao Yu
Ellie Wen
Zeliang Chen
Jiyan Yang
Jongsoo Park
Qiang Liu
A. Kejariwal
56
9
0
16 Oct 2020
Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf
  Language Models
Reflective Decoding: Beyond Unidirectional Generation with Off-the-Shelf Language Models
Peter West
Ximing Lu
Ari Holtzman
Chandra Bhagavatula
Jena D. Hwang
Yejin Choi
OffRL
59
13
0
16 Oct 2020
An Approximation Algorithm for Optimal Subarchitecture Extraction
An Approximation Algorithm for Optimal Subarchitecture Extraction
Adrian de Wynter
70
1
0
16 Oct 2020
For self-supervised learning, Rationality implies generalization,
  provably
For self-supervised learning, Rationality implies generalization, provably
Yamini Bansal
Gal Kaplun
Boaz Barak
OODSSL
112
22
0
16 Oct 2020
The Deep Bootstrap Framework: Good Online Learners are Good Offline
  Generalizers
The Deep Bootstrap Framework: Good Online Learners are Good Offline Generalizers
Preetum Nakkiran
Behnam Neyshabur
Hanie Sedghi
OffRL
97
11
0
16 Oct 2020
PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for
  Symbolic Music
PiRhDy: Learning Pitch-, Rhythm-, and Dynamics-aware Embeddings for Symbolic Music
Hongru Liang
Wenqiang Lei
P. Chan
Zhenglu Yang
Maosong Sun
Tat-Seng Chua
64
41
0
16 Oct 2020
Masked Contrastive Representation Learning for Reinforcement Learning
Masked Contrastive Representation Learning for Reinforcement Learning
Jinhua Zhu
Yingce Xia
Lijun Wu
Jiajun Deng
Wen-gang Zhou
Tao Qin
Houqiang Li
SSLOffRL
110
60
0
15 Oct 2020
Decoding Methods for Neural Narrative Generation
Decoding Methods for Neural Narrative Generation
Alexandra DeLucia
Aaron Mueller
Xiang Lisa Li
João Sedoc
62
26
0
14 Oct 2020
Explaining Creative Artifacts
Explaining Creative Artifacts
Lav Varshney
Nazneen Rajani
R. Socher
129
2
0
14 Oct 2020
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical
  Supervision from Extractive Summaries
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
59
20
0
14 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime
  with Search
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
94
98
0
14 Oct 2020
Neural Databases
Neural Databases
James Thorne
Majid Yazdani
Marzieh Saeidi
Fabrizio Silvestri
Sebastian Riedel
A. Halevy
NAI
96
9
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
387
628
0
13 Oct 2020
MixCo: Mix-up Contrastive Learning for Visual Representation
MixCo: Mix-up Contrastive Learning for Visual Representation
Sungnyun Kim
Gihun Lee
Sangmin Bae
Seyoung Yun
SSL
167
81
0
13 Oct 2020
Improving Text Generation with Student-Forcing Optimal Transport
Improving Text Generation with Student-Forcing Optimal Transport
Guoyin Wang
Chunyuan Li
Jianqiao Li
Hao Fu
Yuh-Chen Lin
...
Ruiyi Zhang
Wenlin Wang
Dinghan Shen
Qian Yang
Lawrence Carin
OT
78
18
0
12 Oct 2020
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
COMET-ATOMIC 2020: On Symbolic and Neural Commonsense Knowledge Graphs
Jena D. Hwang
Chandra Bhagavatula
Ronan Le Bras
Jeff Da
Keisuke Sakaguchi
Antoine Bosselut
Yejin Choi
81
415
0
12 Oct 2020
Neural, Symbolic and Neural-Symbolic Reasoning on Knowledge Graphs
Neural, Symbolic and Neural-Symbolic Reasoning on Knowledge Graphs
Jing Zhang
Bo Chen
Lingxi Zhang
Xirui Ke
Haipeng Ding
NAI
97
3
0
12 Oct 2020
Learning Which Features Matter: RoBERTa Acquires a Preference for
  Linguistic Generalizations (Eventually)
Learning Which Features Matter: RoBERTa Acquires a Preference for Linguistic Generalizations (Eventually)
Alex Warstadt
Yian Zhang
Haau-Sing Li
Haokun Liu
Samuel R. Bowman
SSLAI4CE
78
21
0
11 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
101
46
0
11 Oct 2020
What causes the test error? Going beyond bias-variance via ANOVA
What causes the test error? Going beyond bias-variance via ANOVA
Licong Lin
Yan Sun
93
34
0
11 Oct 2020
AutoQA: From Databases To QA Semantic Parsers With Only Synthetic
  Training Data
AutoQA: From Databases To QA Semantic Parsers With Only Synthetic Training Data
Silei Xu
Sina J. Semnani
Giovanni Campagna
M. Lam
76
52
0
09 Oct 2020
On the importance of pre-training data volume for compact language
  models
On the importance of pre-training data volume for compact language models
Vincent Micheli
Martin d'Hoffschmidt
Franccois Fleuret
67
42
0
08 Oct 2020
AxFormer: Accuracy-driven Approximation of Transformers for Faster,
  Smaller and more Accurate NLP Models
AxFormer: Accuracy-driven Approximation of Transformers for Faster, Smaller and more Accurate NLP Models
Amrit Nagarajan
Sanchari Sen
Jacob R. Stevens
A. Raghunathan
18
3
0
07 Oct 2020
A Mathematical Exploration of Why Language Models Help Solve Downstream
  Tasks
A Mathematical Exploration of Why Language Models Help Solve Downstream Tasks
Nikunj Saunshi
Sadhika Malladi
Sanjeev Arora
87
89
0
07 Oct 2020
A ground-truth dataset and classification model for detecting bots in
  GitHub issue and PR comments
A ground-truth dataset and classification model for detecting bots in GitHub issue and PR comments
M. Golzadeh
Alexandre Decan
Damien Legay
T. Mens
57
78
0
07 Oct 2020
Representation Learning for Sequence Data with Deep Autoencoding
  Predictive Components
Representation Learning for Sequence Data with Deep Autoencoding Predictive Components
Junwen Bai
Weiran Wang
Yingbo Zhou
Caiming Xiong
SSLAI4TS
75
12
0
07 Oct 2020
A Closer Look at Codistillation for Distributed Training
A Closer Look at Codistillation for Distributed Training
Shagun Sodhani
Olivier Delalleau
Mahmoud Assran
Koustuv Sinha
Nicolas Ballas
Michael G. Rabbat
129
8
0
06 Oct 2020
A Transformer-based Framework for Multivariate Time Series
  Representation Learning
A Transformer-based Framework for Multivariate Time Series Representation Learning
George Zerveas
Srideepika Jayaraman
Dhaval Patel
A. Bhamidipaty
Carsten Eickhoff
AI4TS
109
940
0
06 Oct 2020
InfoBERT: Improving Robustness of Language Models from An Information
  Theoretic Perspective
InfoBERT: Improving Robustness of Language Models from An Information Theoretic Perspective
Wei Ping
Shuohang Wang
Yu Cheng
Zhe Gan
R. Jia
Yue Liu
Jingjing Liu
AAML
215
116
0
05 Oct 2020
Previous
123...240241242243244245
Next