Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1906.08237
Cited By
v1
v2 (latest)
XLNet: Generalized Autoregressive Pretraining for Language Understanding
19 June 2019
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"XLNet: Generalized Autoregressive Pretraining for Language Understanding"
50 / 3,522 papers shown
Title
SAS: Self-Augmentation Strategy for Language Model Pre-training
Yifei Xu
Jingqiao Zhang
Ru He
Liangzhu Ge
Chao Yang
Cheng Yang
Ying Wu
59
1
0
14 Jun 2021
Pre-Trained Models: Past, Present and Future
Xu Han
Zhengyan Zhang
Ning Ding
Yuxian Gu
Xiao Liu
...
Jie Tang
Ji-Rong Wen
Jinhui Yuan
Wayne Xin Zhao
Jun Zhu
AIFin
MQ
AI4MH
179
865
0
14 Jun 2021
Why Can You Lay Off Heads? Investigating How BERT Heads Transfer
Ting-Rui Chiang
Yun-Nung Chen
38
0
0
14 Jun 2021
Target Model Agnostic Adversarial Attacks with Query Budgets on Language Understanding Models
Jatin Chauhan
Karan Bhukar
Manohar Kaul
AAML
39
1
0
13 Jun 2021
The DEformer: An Order-Agnostic Distribution Estimating Transformer
Michael A. Alcorn
Anh Totti Nguyen
30
4
0
13 Jun 2021
InfoBehavior: Self-supervised Representation Learning for Ultra-long Behavior Sequence via Hierarchical Grouping
Runshi Liu
Pengda Qin
Yuhong Li
Weigao Wen
Dong Li
Kefeng Deng
Qiang Wu
AI4TS
56
0
0
13 Jun 2021
Can Transformer Language Models Predict Psychometric Properties?
Antonio Laverghetta
Animesh Nighojkar
Jamshidbek Mirzakhalov
John Licato
LM&MA
71
14
0
12 Jun 2021
Neural Combinatory Constituency Parsing
Zhousi Chen
Longtu Zhang
Aizhan Imankulova
Mamoru Komachi
75
2
0
12 Jun 2021
Leveraging Pre-trained Language Model for Speech Sentiment Analysis
Suwon Shon
Pablo Brusco
Jing Pan
Kyu Jeong Han
Shinji Watanabe
61
17
0
11 Jun 2021
What Can Knowledge Bring to Machine Learning? -- A Survey of Low-shot Learning for Structured Data
Yang Hu
Adriane P. Chapman
Guihua Wen
Dame Wendy Hall
100
25
0
11 Jun 2021
Bridging Subword Gaps in Pretrain-Finetune Paradigm for Natural Language Generation
Xin Liu
Baosong Yang
Dayiheng Liu
Haibo Zhang
Weihua Luo
Min Zhang
Haiying Zhang
Jinsong Su
63
18
0
11 Jun 2021
RefBERT: Compressing BERT by Referencing to Pre-computed Representations
Xinyi Wang
Haiqing Yang
Liang Zhao
Yang Mo
Jianping Shen
MQ
80
4
0
11 Jun 2021
Causal Analysis of Syntactic Agreement Mechanisms in Neural Language Models
Matthew Finlayson
Aaron Mueller
Sebastian Gehrmann
Stuart M. Shieber
Tal Linzen
Yonatan Belinkov
132
110
0
10 Jun 2021
A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents
Li Dong
Matthew C. Spencer
Amir Biagi
52
3
0
10 Jun 2021
CAT: Cross Attention in Vision Transformer
Hezheng Lin
Xingyi Cheng
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Qing Song
Wei Yuan
ViT
65
158
0
10 Jun 2021
Exploring Unsupervised Pretraining Objectives for Machine Translation
Christos Baziotis
Ivan Titov
Alexandra Birch
Barry Haddow
AAML
AI4CE
51
8
0
10 Jun 2021
MusicBERT: Symbolic Music Understanding with Large-Scale Pre-Training
Mingliang Zeng
Xu Tan
Rui Wang
Zeqian Ju
Tao Qin
Tie-Yan Liu
70
136
0
10 Jun 2021
Convolutions and Self-Attention: Re-interpreting Relative Positions in Pre-trained Language Models
Tyler A. Chang
Yifan Xu
Weijian Xu
Zhuowen Tu
ViT
57
15
0
10 Jun 2021
Semantic-aware Binary Code Representation with BERT
Hyungjoon Koo
Soyeon Park
Daejin Choi
Taesoo Kim
64
24
0
10 Jun 2021
Variational Information Bottleneck for Effective Low-Resource Fine-Tuning
Rabeeh Karimi Mahabadi
Yonatan Belinkov
James Henderson
DRL
76
76
0
10 Jun 2021
Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud
Jashwant Raj Gunasekaran
Cyan Subhra Mishra
P. Thinakaran
M. Kandemir
Chita R. Das
39
3
0
09 Jun 2021
Bayesian Attention Belief Networks
Shujian Zhang
Xinjie Fan
Bo Chen
Mingyuan Zhou
BDL
110
32
0
09 Jun 2021
Instantaneous Grammatical Error Correction with Shallow Aggressive Decoding
Xin Sun
Tao Ge
Furu Wei
Houfeng Wang
103
64
0
09 Jun 2021
Phraseformer: Multimodal Key-phrase Extraction using Transformer and Graph Embedding
Narjes Nikzad Khasmakhi
M. Feizi-Derakhshi
M. Asgari-Chenaghlu
M. Balafar
Ali Reza Feizi Derakhshi
Taymaz Rahkar-Farshi
Majid Ramezani
Zoleikha Jahanbakhsh-Nagadeh
E. Zafarani-Moattar
Mehrdad Ranjbar-Khadivi
60
23
0
09 Jun 2021
Neural Supervised Domain Adaptation by Augmenting Pre-trained Models with Random Units
Sara Meftah
N. Semmar
Y. Tamaazousti
H. Essafi
F. Sadat
62
3
0
09 Jun 2021
Automatic Sexism Detection with Multilingual Transformer Models
Mina Schütz
Jaqueline Boeck
Daria Liakhovets
D. Slijepcevic
Armin Kirchknopf
Manuel Hecht
Johannes Bogensperger
S. Schlarb
Alexander Schindler
Matthias Zeppelzauer
38
29
0
09 Jun 2021
VALUE: A Multi-Task Benchmark for Video-and-Language Understanding Evaluation
Linjie Li
Jie Lei
Zhe Gan
Licheng Yu
Yen-Chun Chen
...
Tamara L. Berg
Joey Tianyi Zhou
Jingjing Liu
Lijuan Wang
Zicheng Liu
VLM
121
103
0
08 Jun 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
59
22
0
08 Jun 2021
Incorporating NODE with Pre-trained Neural Differential Operator for Learning Dynamics
Shiqi Gong
Qi Meng
Yue Wang
Lijun Wu
Wei Chen
Zhi-Ming Ma
Tie-Yan Liu
57
4
0
08 Jun 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
76
28
0
07 Jun 2021
Diverse Pretrained Context Encodings Improve Document Translation
Domenic Donato
Lei Yu
Chris Dyer
54
16
0
07 Jun 2021
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Xin Guo
Jianlei Yang
Haoyi Zhou
Xucheng Ye
Jianxin Li
57
1
0
07 Jun 2021
BERTGEN: Multi-task Generation through BERT
Faidon Mitzalis
Ozan Caglayan
Pranava Madhyastha
Lucia Specia
VLM
48
7
0
07 Jun 2021
Meta-learning for downstream aware and agnostic pretraining
Hongyin Luo
Shuyan Dong
Yung-Sung Chuang
Shang-Wen Li
62
0
0
06 Jun 2021
Learnable Fourier Features for Multi-Dimensional Spatial Positional Encoding
Yang Li
Si Si
Gang Li
Cho-Jui Hsieh
Samy Bengio
102
96
0
05 Jun 2021
Weakly-Supervised Methods for Suicide Risk Assessment: Role of Related Domains
Chenghao Yang
Yudong Zhang
Smaranda Muresan
AI4MH
55
5
0
05 Jun 2021
Exposing the Implicit Energy Networks behind Masked Language Models via Metropolis--Hastings
Kartik Goyal
Chris Dyer
Taylor Berg-Kirkpatrick
178
51
0
04 Jun 2021
Layered gradient accumulation and modular pipeline parallelism: fast and efficient training of large language models
J. Lamy-Poirier
MoE
146
8
0
04 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
81
13
0
04 Jun 2021
Language Scaling for Universal Suggested Replies Model
Qianlan Ying
Payal Bajaj
Budhaditya Deb
Yu Yang
Wei Wang
Bojia Lin
Milad Shokouhi
Xia Song
Yang Yang
Daxin Jiang
LRM
58
2
0
04 Jun 2021
Self-supervised Dialogue Learning for Spoken Conversational Question Answering
Nuo Chen
Chenyu You
Yuexian Zou
SSL
93
34
0
04 Jun 2021
Defending Democracy: Using Deep Learning to Identify and Prevent Misinformation
Anusua Trivedi
Alyssa Suhm
Prathamesh Mahankal
Subhiksha Mukuntharaj
Meghana D. Parab
Malvika Mohan
Meredith Berger
Arathi Sethumadhavan
A. Jaiman
Rahul Dodhia
117
0
0
03 Jun 2021
Defending Against Backdoor Attacks in Natural Language Generation
Xiaofei Sun
Xiaoya Li
Yuxian Meng
Xiang Ao
Leilei Gan
Jiwei Li
Tianwei Zhang
AAML
SILM
103
52
0
03 Jun 2021
TVDIM: Enhancing Image Self-Supervised Pretraining via Noisy Text Data
Pengda Qin
Yuhong Li
Kefeng Deng
Qiang Wu
30
1
0
03 Jun 2021
Fingerprinting Fine-tuned Language Models in the Wild
Nirav Diwan
Tanmoy Chakraborty
Zubair Shafiq
DeLMO
31
12
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
86
23
0
02 Jun 2021
Multilingual Medical Question Answering and Information Retrieval for Rural Health Intelligence Access
Vishal Vinod
Susmit Agrawal
V. Gaurav
R. Pallavi
Savita Choudhary
22
3
0
02 Jun 2021
Topic-Aware Evidence Reasoning and Stance-Aware Aggregation for Fact Verification
Jiasheng Si
Deyu Zhou
Tong Li
Xingyu Shi
Yulan He
71
39
0
02 Jun 2021
Hi-Transformer: Hierarchical Interactive Transformer for Efficient and Effective Long Document Modeling
Chuhan Wu
Fangzhao Wu
Tao Qi
Yongfeng Huang
121
68
0
02 Jun 2021
One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers
Chuhan Wu
Fangzhao Wu
Yongfeng Huang
72
65
0
02 Jun 2021
Previous
1
2
3
...
42
43
44
...
69
70
71
Next