Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1910.10683
Cited By
v1
v2
v3
v4 (latest)
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
23 October 2019
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer"
50 / 9,843 papers shown
Title
Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech
Mihir Kale
Scott Roy
51
14
0
05 Apr 2020
A Hierarchical Network for Abstractive Meeting Summarization with Cross-Domain Pretraining
Chenguang Zhu
Ruochen Xu
Michael Zeng
Xuedong Huang
BDL
AI4TS
89
18
0
04 Apr 2020
Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models
Sheng-Chieh Lin
Jheng-Hong Yang
Rodrigo Nogueira
Ming-Feng Tsai
Chuan-Ju Wang
Jimmy J. Lin
53
82
0
04 Apr 2020
Abstractive Summarization with Combination of Pre-trained Sequence-to-Sequence and Saliency Models
Itsumi Saito
Kyosuke Nishida
Kosuke Nishida
J. Tomita
77
29
0
29 Mar 2020
TLDR: Token Loss Dynamic Reweighting for Reducing Repetitive Utterance Generation
Shaojie Jiang
Thomas Wolf
Christof Monz
Maarten de Rijke
59
12
0
26 Mar 2020
A Survey of Deep Learning for Scientific Discovery
M. Raghu
Erica Schmidt
OOD
AI4CE
179
123
0
26 Mar 2020
Felix: Flexible Text Editing Through Tagging and Insertion
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
Guillermo Garrido
82
76
0
24 Mar 2020
Word2Vec: Optimal Hyper-Parameters and Their Impact on NLP Downstream Tasks
Tosin Adewumi
F. Liwicki
Marcus Liwicki
VLM
42
20
0
23 Mar 2020
TTTTTackling WinoGrande Schemas
Sheng-Chieh Lin
Jheng-Hong Yang
Rodrigo Nogueira
Ming-Feng Tsai
Chuan-Ju Wang
Jimmy Lin
55
6
0
18 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
388
1,498
0
18 Mar 2020
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
274
151
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
40
31
0
16 Mar 2020
Document Ranking with a Pretrained Sequence-to-Sequence Model
Rodrigo Nogueira
Zhiying Jiang
Jimmy J. Lin
102
587
0
14 Mar 2020
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
85
112
0
13 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
89
283
0
10 Mar 2020
Adaptive Name Entity Recognition under Highly Unbalanced Data
Thong Nguyen
Duy Nguyen
Pramod Rao
48
10
0
10 Mar 2020
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
145
80
0
05 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
36
94
0
04 Mar 2020
Anchor Attention for Hybrid Crowd Forecasts Aggregation
Yuzhong Huang
A. Abeliuk
Fred Morstatter
P. Atanasov
Aram Galstyan
47
3
0
03 Mar 2020
Deep Learning in Memristive Nanowire Networks
Jack D. Kendall
Ross D. Pantone
J. Nino
21
2
0
03 Mar 2020
CLUECorpus2020: A Large-scale Chinese Corpus for Pre-training Language Model
Liang Xu
Xuanwei Zhang
Qianqian Dong
SSL
66
71
0
03 Mar 2020
Med7: a transferable clinical natural language processing model for electronic health records
Andrey Kormilitzin
N. Vaci
Qiang Liu
A. Nevado-Holgado
97
120
0
03 Mar 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
162
975
0
28 Feb 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
88
397
0
28 Feb 2020
Few-shot Natural Language Generation for Task-Oriented Dialog
Baolin Peng
Chenguang Zhu
Chunyuan Li
Xiujun Li
Jinchao Li
Michael Zeng
Jianfeng Gao
91
201
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
137
1,510
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
134
201
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
138
151
0
26 Feb 2020
On Feature Normalization and Data Augmentation
Boyi Li
Felix Wu
Ser-Nam Lim
Serge J. Belongie
Kilian Q. Weinberger
56
137
0
25 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
227
1,285
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
83
67
0
24 Feb 2020
Training Question Answering Models From Synthetic Data
Raul Puri
Ryan Spring
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
ELM
81
160
0
22 Feb 2020
Fast local linear regression with anchor regularization
Mathis Petrovich
M. Yamada
OffRL
29
3
0
21 Feb 2020
CodeBERT: A Pre-Trained Model for Programming and Natural Languages
Zhangyin Feng
Daya Guo
Duyu Tang
Nan Duan
Xiaocheng Feng
...
Linjun Shou
Bing Qin
Ting Liu
Daxin Jiang
Ming Zhou
208
2,714
0
19 Feb 2020
LAMBERT: Layout-Aware (Language) Modeling for information extraction
Lukasz Garncarek
Rafal Powalski
Tomasz Stanislawek
Bartosz Topolski
Piotr Halama
M. Turski
Filip Graliñski
84
88
0
19 Feb 2020
The Microsoft Toolkit of Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Yu Wang
Jianshu Ji
Hao Cheng
Xueyun Zhu
...
Pengcheng He
Weizhu Chen
Hoifung Poon
Guihong Cao
Jianfeng Gao
AI4CE
77
61
0
19 Feb 2020
UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation
Huaishao Luo
Lei Ji
Botian Shi
Haoyang Huang
Nan Duan
Tianrui Li
Jason Li
Xilin Chen
Ming Zhou
VLM
126
438
0
15 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
103
598
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
60
50
0
14 Feb 2020
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng Zhang
Alex Smola
80
8
0
14 Feb 2020
GLU Variants Improve Transformer
Noam M. Shazeer
177
1,026
0
12 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
144
898
0
10 Feb 2020
REALM: Retrieval-Augmented Language Model Pre-Training
Kelvin Guu
Kenton Lee
Zora Tung
Panupong Pasupat
Ming-Wei Chang
RALM
151
2,126
0
10 Feb 2020
Semi-Supervised Class Discovery
Jeremy Nixon
J. Liu
David Berthelot
65
2
0
10 Feb 2020
Momentum Improves Normalized SGD
Ashok Cutkosky
Harsh Mehta
ODL
107
128
0
09 Feb 2020
Segmented Graph-Bert for Graph Instance Modeling
Jiawei Zhang
SSeg
60
6
0
09 Feb 2020
Description Based Text Classification with Reinforcement Learning
Duo Chai
Wei Wu
Qinghong Han
Leilei Gan
Jiwei Li
VLM
181
68
0
08 Feb 2020
K-Adapter: Infusing Knowledge into Pre-Trained Models with Adapters
Ruize Wang
Duyu Tang
Nan Duan
Zhongyu Wei
Xuanjing Huang
Jianshu Ji
Guihong Cao
Daxin Jiang
Ming Zhou
KELM
135
557
0
05 Feb 2020
DUMA: Reading Comprehension with Transposition Thinking
Pengfei Zhu
Hai Zhao
Xiaoguang Li
AI4CE
82
35
0
26 Jan 2020
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
Dongling Xiao
Han Zhang
Yukun Li
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
85
127
0
26 Jan 2020
Previous
1
2
3
...
195
196
197
Next