ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.14165
  4. Cited By
Language Models are Few-Shot Learners

Language Models are Few-Shot Learners

28 May 2020
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
Prafulla Dhariwal
Arvind Neelakantan
Pranav Shyam
Girish Sastry
Amanda Askell
Sandhini Agarwal
Ariel Herbert-Voss
Gretchen Krueger
T. Henighan
R. Child
Aditya A. Ramesh
Daniel M. Ziegler
Jeff Wu
Clemens Winter
Christopher Hesse
Mark Chen
Eric Sigler
Ma-teusz Litwin
Scott Gray
B. Chess
Jack Clark
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
    BDL
ArXivPDFHTML

Papers citing "Language Models are Few-Shot Learners"

50 / 11,214 papers shown
Title
Computer-Aided Design as Language
Computer-Aided Design as Language
Yaroslav Ganin
Sergey Bartunov
Yujia Li
E. Keller
Stefano Saliceti
3DV
104
90
0
06 May 2021
Reliability Testing for Natural Language Processing Systems
Reliability Testing for Natural Language Processing Systems
Samson Tan
Chenyu You
K. Baxter
Araz Taeihagh
G. Bennett
Min-Yen Kan
15
38
0
06 May 2021
A Unified Transferable Model for ML-Enhanced DBMS
A Unified Transferable Model for ML-Enhanced DBMS
Ziniu Wu
Pei Yu
Peilun Yang
Rong Zhu
Yuxing Han
Yaliang Li
Defu Lian
K. Zeng
Jingren Zhou
42
31
0
06 May 2021
Training Quantum Embedding Kernels on Near-Term Quantum Computers
Training Quantum Embedding Kernels on Near-Term Quantum Computers
T. Hubregtsen
David Wierichs
Elies Gil-Fuster
Peter-Jan H. S. Derks
Paul K. Faehrmann
Johannes Jakob Meyer
28
95
0
05 May 2021
Rethinking Search: Making Domain Experts out of Dilettantes
Rethinking Search: Making Domain Experts out of Dilettantes
Donald Metzler
Yi Tay
Dara Bahri
Marc Najork
LRM
38
46
0
05 May 2021
Image Embedding and Model Ensembling for Automated Chest X-Ray
  Interpretation
Image Embedding and Model Ensembling for Automated Chest X-Ray Interpretation
E. Giacomello
P. Lanzi
Daniele Loiacono
Luca Nassano
17
5
0
05 May 2021
TransHash: Transformer-based Hamming Hashing for Efficient Image
  Retrieval
TransHash: Transformer-based Hamming Hashing for Efficient Image Retrieval
Yongbiao Chen
Shenmin Zhang
Fangxin Liu
Zhigang Chang
Mang Ye
Zhengwei Qi Shanghai Jiao Tong University
ViT
29
50
0
05 May 2021
HerBERT: Efficiently Pretrained Transformer-based Language Model for
  Polish
HerBERT: Efficiently Pretrained Transformer-based Language Model for Polish
Robert Mroczkowski
Piotr Rybak
Alina Wróblewska
Ireneusz Gawlik
36
81
0
04 May 2021
Inferring the Reader: Guiding Automated Story Generation with
  Commonsense Reasoning
Inferring the Reader: Guiding Automated Story Generation with Commonsense Reasoning
Xiangyu Peng
Siyan Li
Sarah Wiegreffe
Mark O. Riedl
LRM
50
38
0
04 May 2021
One Model to Rule them All: Towards Zero-Shot Learning for Databases
One Model to Rule them All: Towards Zero-Shot Learning for Databases
Benjamin Hilprecht
Carsten Binnig
VLM
29
33
0
03 May 2021
Data-driven Weight Initialization with Sylvester Solvers
Data-driven Weight Initialization with Sylvester Solvers
Debasmit Das
Yash Bhalgat
Fatih Porikli
ODL
38
3
0
02 May 2021
Stealthy Backdoors as Compression Artifacts
Stealthy Backdoors as Compression Artifacts
Yulong Tian
Fnu Suya
Fengyuan Xu
David Evans
35
22
0
30 Apr 2021
Scaling End-to-End Models for Large-Scale Multilingual ASR
Scaling End-to-End Models for Large-Scale Multilingual ASR
Bo-wen Li
Ruoming Pang
Tara N. Sainath
Anmol Gulati
Yu Zhang
James Qin
Parisa Haghani
Yifan Jiang
Min Ma
Junwen Bai
CLL
34
76
0
30 Apr 2021
Entailment as Few-Shot Learner
Entailment as Few-Shot Learner
Sinong Wang
Han Fang
Madian Khabsa
Hanzi Mao
Hao Ma
35
183
0
29 Apr 2021
MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to
  Limited Data Domains
MineGAN++: Mining Generative Models for Efficient Knowledge Transfer to Limited Data Domains
Yaxing Wang
Abel Gonzalez-Garcia
Chenshen Wu
Luis Herranz
Fahad Shahbaz Khan
Shangling Jui
Joost van de Weijer
32
6
0
28 Apr 2021
Sifting out the features by pruning: Are convolutional networks the
  winning lottery ticket of fully connected ones?
Sifting out the features by pruning: Are convolutional networks the winning lottery ticket of fully connected ones?
Franco Pellegrini
Giulio Biroli
49
6
0
27 Apr 2021
Shellcode_IA32: A Dataset for Automatic Shellcode Generation
Shellcode_IA32: A Dataset for Automatic Shellcode Generation
Pietro Liguori
Erfan Al-Hossami
Domenico Cotroneo
R. Natella
B. Cukic
Samira Shaikh
34
27
0
27 Apr 2021
AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited
  Transcriptions
AT-ST: Self-Training Adaptation Strategy for OCR in Domains with Limited Transcriptions
M. Kišš
Karel Beneš
Michal Hradiš
64
13
0
27 Apr 2021
If your data distribution shifts, use self-learning
If your data distribution shifts, use self-learning
E. Rusak
Steffen Schneider
George Pachitariu
L. Eck
Peter V. Gehler
Oliver Bringmann
Wieland Brendel
Matthias Bethge
VLM
OOD
TTA
81
30
0
27 Apr 2021
One Billion Audio Sounds from GPU-enabled Modular Synthesis
One Billion Audio Sounds from GPU-enabled Modular Synthesis
Joseph P. Turian
Jordie Shier
George Tzanetakis
K. McNally
Max Henry
21
22
0
27 Apr 2021
PanGu-$α$: Large-scale Autoregressive Pretrained Chinese Language
  Models with Auto-parallel Computation
PanGu-ααα: Large-scale Autoregressive Pretrained Chinese Language Models with Auto-parallel Computation
Wei Zeng
Xiaozhe Ren
Teng Su
Hui Wang
Yi-Lun Liao
...
Gaojun Fan
Yaowei Wang
Xuefeng Jin
Qun Liu
Yonghong Tian
ALM
MoE
AI4CE
35
212
0
26 Apr 2021
Generating abstractive summaries of Lithuanian news articles using a
  transformer model
Generating abstractive summaries of Lithuanian news articles using a transformer model
Lukas Stankevicius
M. Lukoševičius
24
2
0
23 Apr 2021
Partitioning sparse deep neural networks for scalable training and
  inference
Partitioning sparse deep neural networks for scalable training and inference
G. Demirci
Hakan Ferhatosmanoglu
20
11
0
23 Apr 2021
Literature review on vulnerability detection using NLP technology
Literature review on vulnerability detection using NLP technology
Jiajie Wu
39
14
0
23 Apr 2021
Multiscale Vision Transformers
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
Understanding and Avoiding AI Failures: A Practical Guide
Understanding and Avoiding AI Failures: A Practical Guide
R. M. Williams
Roman V. Yampolskiy
30
24
0
22 Apr 2021
All Tokens Matter: Token Labeling for Training Better Vision
  Transformers
All Tokens Matter: Token Labeling for Training Better Vision Transformers
Zihang Jiang
Qibin Hou
Li-xin Yuan
Daquan Zhou
Yujun Shi
Xiaojie Jin
Anran Wang
Jiashi Feng
ViT
27
204
0
22 Apr 2021
Provable Limitations of Acquiring Meaning from Ungrounded Form: What
  Will Future Language Models Understand?
Provable Limitations of Acquiring Meaning from Ungrounded Form: What Will Future Language Models Understand?
William Merrill
Yoav Goldberg
Roy Schwartz
Noah A. Smith
25
67
0
22 Apr 2021
ScaleCom: Scalable Sparsified Gradient Compression for
  Communication-Efficient Distributed Training
ScaleCom: Scalable Sparsified Gradient Compression for Communication-Efficient Distributed Training
Chia-Yu Chen
Jiamin Ni
Songtao Lu
Xiaodong Cui
Pin-Yu Chen
...
Naigang Wang
Swagath Venkataramani
Vijayalakshmi Srinivasan
Wei Zhang
K. Gopalakrishnan
29
66
0
21 Apr 2021
Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Adapting Long Context NLM for ASR Rescoring in Conversational Agents
Ashish Shenoy
S. Bodapati
Monica Sunkara
S. Ronanki
Katrin Kirchhoff
31
21
0
21 Apr 2021
RoFormer: Enhanced Transformer with Rotary Position Embedding
RoFormer: Enhanced Transformer with Rotary Position Embedding
Jianlin Su
Yu Lu
Shengfeng Pan
Ahmed Murtadha
Bo Wen
Yunfeng Liu
46
2,190
0
20 Apr 2021
BERTić -- The Transformer Language Model for Bosnian, Croatian,
  Montenegrin and Serbian
BERTić -- The Transformer Language Model for Bosnian, Croatian, Montenegrin and Serbian
N. Ljubešić
D. Lauc
16
48
0
19 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and
  its application in fault diagnosis of rolling bearings
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of
  Convolutional Neural Networks
An Oracle for Guiding Large-Scale Model/Hybrid Parallel Training of Convolutional Neural Networks
A. Kahira
Truong Thao Nguyen
L. Bautista-Gomez
Ryousei Takano
Rosa M. Badia
M. Wahib
15
9
0
19 Apr 2021
Data-Efficient Language-Supervised Zero-Shot Learning with
  Self-Distillation
Data-Efficient Language-Supervised Zero-Shot Learning with Self-Distillation
Rui Cheng
Bichen Wu
Peizhao Zhang
Peter Vajda
Joseph E. Gonzalez
CLIP
VLM
21
31
0
18 Apr 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in
  NLP
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
223
180
0
18 Apr 2021
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Kang Min Yoo
Dongju Park
Jaewook Kang
Sang-Woo Lee
Woomyeong Park
36
235
0
18 Apr 2021
Fantastically Ordered Prompts and Where to Find Them: Overcoming
  Few-Shot Prompt Order Sensitivity
Fantastically Ordered Prompts and Where to Find Them: Overcoming Few-Shot Prompt Order Sensitivity
Yao Lu
Max Bartolo
Alastair Moore
Sebastian Riedel
Pontus Stenetorp
AILaw
LRM
279
1,125
0
18 Apr 2021
Cross-Task Generalization via Natural Language Crowdsourcing
  Instructions
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Swaroop Mishra
Daniel Khashabi
Chitta Baral
Hannaneh Hajishirzi
LRM
60
719
0
18 Apr 2021
Cross-Attention is All You Need: Adapting Pretrained Transformers for
  Machine Translation
Cross-Attention is All You Need: Adapting Pretrained Transformers for Machine Translation
Mozhdeh Gheini
Xiang Ren
Jonathan May
LRM
31
105
0
18 Apr 2021
A Token-level Reference-free Hallucination Detection Benchmark for
  Free-form Text Generation
A Token-level Reference-free Hallucination Detection Benchmark for Free-form Text Generation
Tianyu Liu
Yizhe Zhang
Chris Brockett
Yi Mao
Zhifang Sui
Weizhu Chen
W. Dolan
HILM
228
144
0
18 Apr 2021
ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models
  with Huge Embedding Table
ScaleFreeCTR: MixCache-based Distributed Training System for CTR Models with Huge Embedding Table
Huifeng Guo
Wei Guo
Yong Gao
Ruiming Tang
Xiuqiang He
Wenzhi Liu
38
20
0
17 Apr 2021
Data Distillation for Text Classification
Data Distillation for Text Classification
Yongqi Li
Wenjie Li
DD
27
28
0
17 Apr 2021
On the Importance of Effectively Adapting Pretrained Language Models for
  Active Learning
On the Importance of Effectively Adapting Pretrained Language Models for Active Learning
Katerina Margatina
Loïc Barrault
Nikolaos Aletras
27
36
0
16 Apr 2021
What to Pre-Train on? Efficient Intermediate Task Selection
What to Pre-Train on? Efficient Intermediate Task Selection
Clifton A. Poth
Jonas Pfeiffer
Andreas Rucklé
Iryna Gurevych
21
94
0
16 Apr 2021
Editing Factual Knowledge in Language Models
Editing Factual Knowledge in Language Models
Nicola De Cao
Wilker Aziz
Ivan Titov
KELM
68
476
0
16 Apr 2021
Back to Square One: Artifact Detection, Training and Commonsense
  Disentanglement in the Winograd Schema
Back to Square One: Artifact Detection, Training and Commonsense Disentanglement in the Winograd Schema
Yanai Elazar
Hongming Zhang
Yoav Goldberg
Dan Roth
ReLM
LRM
45
44
0
16 Apr 2021
Language Models are Few-Shot Butlers
Language Models are Few-Shot Butlers
Vincent Micheli
Franccois Fleuret
25
31
0
16 Apr 2021
Probing Across Time: What Does RoBERTa Know and When?
Probing Across Time: What Does RoBERTa Know and When?
Leo Z. Liu
Yizhong Wang
Jungo Kasai
Hannaneh Hajishirzi
Noah A. Smith
KELM
13
80
0
16 Apr 2021
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep
  Learning
ZeRO-Infinity: Breaking the GPU Memory Wall for Extreme Scale Deep Learning
Samyam Rajbhandari
Olatunji Ruwase
Jeff Rasley
Shaden Smith
Yuxiong He
GNN
41
370
0
16 Apr 2021
Previous
123...216217218...223224225
Next