ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.08237
  4. Cited By
XLNet: Generalized Autoregressive Pretraining for Language Understanding
v1v2 (latest)

XLNet: Generalized Autoregressive Pretraining for Language Understanding

19 June 2019
Zhilin Yang
Zihang Dai
Yiming Yang
J. Carbonell
Ruslan Salakhutdinov
Quoc V. Le
    AI4CE
ArXiv (abs)PDFHTML

Papers citing "XLNet: Generalized Autoregressive Pretraining for Language Understanding"

50 / 3,524 papers shown
Title
Energon: Towards Efficient Acceleration of Transformers Using Dynamic
  Sparse Attention
Energon: Towards Efficient Acceleration of Transformers Using Dynamic Sparse Attention
Zhe Zhou
Junling Liu
Zhenyu Gu
Guangyu Sun
143
45
0
18 Oct 2021
Deep Transfer Learning & Beyond: Transformer Language Models in
  Information Systems Research
Deep Transfer Learning & Beyond: Transformer Language Models in Information Systems Research
Ross Gruetzemacher
D. Paradice
94
35
0
18 Oct 2021
Quantifying the Task-Specific Information in Text-Based Classifications
Quantifying the Task-Specific Information in Text-Based Classifications
Zining Zhu
Aparna Balagopalan
Marzyeh Ghassemi
Frank Rudzicz
76
4
0
17 Oct 2021
GNN-LM: Language Modeling based on Global Contexts via GNN
GNN-LM: Language Modeling based on Global Contexts via GNN
Yuxian Meng
Shi Zong
Xiaoya Li
Xiaofei Sun
Tianwei Zhang
Leilei Gan
Jiwei Li
LRM
127
39
0
17 Oct 2021
Improving Transformers with Probabilistic Attention Keys
Improving Transformers with Probabilistic Attention Keys
Tam Nguyen
T. Nguyen
Dung D. Le
Duy Khuong Nguyen
Viet-Anh Tran
Richard G. Baraniuk
Nhat Ho
Stanley J. Osher
132
33
0
16 Oct 2021
On the Robustness of Reading Comprehension Models to Entity Renaming
On the Robustness of Reading Comprehension Models to Entity Renaming
Jun Yan
Yang Xiao
Sagnik Mukherjee
Bill Yuchen Lin
Robin Jia
Xiang Ren
113
20
0
16 Oct 2021
Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning
  for Solving Math Word Problems
Seeking Patterns, Not just Memorizing Procedures: Contrastive Learning for Solving Math Word Problems
Zhongli Li
Wenxuan Zhang
Chao Yan
Qingyu Zhou
Chao Li
Hongzhi Liu
Yunbo Cao
AIMat
88
55
0
16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models
A Short Study on Compressing Decoder-Based Language Models
Tianda Li
Yassir El Mesbahi
I. Kobyzev
Ahmad Rashid
A. Mahmud
Nithin Anchuri
Habib Hajimolahoseini
Yang Liu
Mehdi Rezagholizadeh
153
25
0
16 Oct 2021
Prix-LM: Pretraining for Multilingual Knowledge Base Construction
Prix-LM: Pretraining for Multilingual Knowledge Base Construction
Wenxuan Zhou
Fangyu Liu
Ivan Vulić
Nigel Collier
Muhao Chen
KELM
133
19
0
16 Oct 2021
EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models
EncT5: A Framework for Fine-tuning T5 as Non-autoregressive Models
Frederick Liu
T. Huang
Shihang Lyu
Siamak Shakeri
Hongkun Yu
Jing Li
84
8
0
16 Oct 2021
Detecting Gender Bias in Transformer-based Models: A Case Study on BERT
Detecting Gender Bias in Transformer-based Models: A Case Study on BERT
Bingbing Li
Hongwu Peng
Rajat Sainju
Junhuan Yang
Lei Yang
Yueying Liang
Weiwen Jiang
Binghui Wang
Hang Liu
Caiwen Ding
49
13
0
15 Oct 2021
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP
  Systems Fail
The Dangers of Underclaiming: Reasons for Caution When Reporting How NLP Systems Fail
Sam Bowman
OffRL
122
45
0
15 Oct 2021
Kronecker Decomposition for GPT Compression
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
94
36
0
15 Oct 2021
Tracing Origins: Coreference-aware Machine Reading Comprehension
Tracing Origins: Coreference-aware Machine Reading Comprehension
Baorong Huang
Zhuosheng Zhang
Hai Zhao
134
5
0
15 Oct 2021
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
Tu Vu
Brian Lester
Noah Constant
Rami Al-Rfou
Daniel Cer
VLMLRM
223
290
0
15 Oct 2021
Attention-Free Keyword Spotting
Attention-Free Keyword Spotting
Mashrur M. Morshed
Ahmad Omar Ahsan
123
9
0
14 Oct 2021
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally
  Across Scales and Tasks
P-Tuning v2: Prompt Tuning Can Be Comparable to Fine-tuning Universally Across Scales and Tasks
Xiao Liu
Kaixuan Ji
Yicheng Fu
Weng Lam Tam
Zhengxiao Du
Zhilin Yang
Jie Tang
VLM
306
867
0
14 Oct 2021
Training Neural Networks for Solving 1-D Optimal Piecewise Linear
  Approximation
Training Neural Networks for Solving 1-D Optimal Piecewise Linear Approximation
Hangcheng Dong
Jing-Xiao Liao
Yan Wang
Yixin Chen
Bingguo Liu
Dong Ye
Guodong Liu
315
0
0
14 Oct 2021
Transferring Semantic Knowledge Into Language Encoders
Transferring Semantic Knowledge Into Language Encoders
Mohammad Umair
Francis Ferraro
29
1
0
14 Oct 2021
Plug-Tagger: A Pluggable Sequence Labeling Framework Using Language
  Models
Plug-Tagger: A Pluggable Sequence Labeling Framework Using Language Models
Xin Zhou
Ruotian Ma
Tao Gui
Y. Tan
Qi Zhang
Xuanjing Huang
VLM
75
5
0
14 Oct 2021
Building Chinese Biomedical Language Models via Multi-Level Text
  Discrimination
Building Chinese Biomedical Language Models via Multi-Level Text Discrimination
Quan Wang
Songtai Dai
Benfeng Xu
Yajuan Lyu
Yong Zhu
Hua Wu
Haifeng Wang
107
15
0
14 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
183
203
0
14 Oct 2021
Rethinking Self-Supervision Objectives for Generalizable Coherence
  Modeling
Rethinking Self-Supervision Objectives for Generalizable Coherence Modeling
Prathyusha Jwalapuram
Shafiq Joty
Xiang Lin
119
16
0
14 Oct 2021
Interpreting the Robustness of Neural NLP Models to Textual
  Perturbations
Interpreting the Robustness of Neural NLP Models to Textual Perturbations
Yunxiang Zhang
Liangming Pan
Samson Tan
Min-Yen Kan
74
22
0
14 Oct 2021
bert2BERT: Towards Reusable Pretrained Language Models
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
85
64
0
14 Oct 2021
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
87
47
0
13 Oct 2021
Automated Essay Scoring Using Transformer Models
Automated Essay Scoring Using Transformer Models
Sabrina Ludwig
Christian W. F. Mayer
Christopher Hansen
Kerstin Eilers
Steffen Brandt
85
40
0
13 Oct 2021
Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree
  Structures Inside Arguments
Semantic Role Labeling as Dependency Parsing: Exploring Latent Tree Structures Inside Arguments
Yu Zhang
Qingrong Xia
Shilin Zhou
Yong Jiang
Guohong Fu
Min Zhang
110
29
0
13 Oct 2021
Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese
Mengzi: Towards Lightweight yet Ingenious Pre-trained Models for Chinese
Zhuosheng Zhang
Hanqing Zhang
Keming Chen
Yuhang Guo
Jingyun Hua
Yulong Wang
Ming Zhou
VLM
110
72
0
13 Oct 2021
MDERank: A Masked Document Embedding Rank Approach for Unsupervised
  Keyphrase Extraction
MDERank: A Masked Document Embedding Rank Approach for Unsupervised Keyphrase Extraction
Linhan Zhang
Qian Chen
Wen Wang
Chong Deng
Shiliang Zhang
Bing Li
Wei Wang
Xin Cao
82
59
0
13 Oct 2021
Maximizing Efficiency of Language Model Pre-training for Learning
  Representation
Maximizing Efficiency of Language Model Pre-training for Learning Representation
Junmo Kang
Suwon Shin
Jeonghwan Kim
Jae-Seung Jo
Sung-Hyon Myaeng
18
0
0
13 Oct 2021
Dict-BERT: Enhancing Language Model Pre-training with Dictionary
Dict-BERT: Enhancing Language Model Pre-training with Dictionary
Wenhao Yu
Chenguang Zhu
Yuwei Fang
Donghan Yu
Shuohang Wang
Yichong Xu
Michael Zeng
Meng Jiang
125
65
0
13 Oct 2021
Learning Compact Metrics for MT
Learning Compact Metrics for MT
Amy Pu
Hyung Won Chung
Ankur P. Parikh
Sebastian Gehrmann
Thibault Sellam
94
101
0
12 Oct 2021
Model-based analysis of brain activity reveals the hierarchy of language
  in 305 subjects
Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects
Charlotte Caucheteux
Alexandre Gramfort
J. King
106
32
0
12 Oct 2021
Relative Molecule Self-Attention Transformer
Relative Molecule Self-Attention Transformer
Lukasz Maziarka
Dawid Majchrowski
Tomasz Danel
Piotr Gaiñski
Jacek Tabor
Igor T. Podolak
Pawel M. Morkisz
Stanislaw Jastrzebski
MedIm
100
36
0
12 Oct 2021
SEPP: Similarity Estimation of Predicted Probabilities for Defending and
  Detecting Adversarial Text
SEPP: Similarity Estimation of Predicted Probabilities for Defending and Detecting Adversarial Text
Hoang-Quoc Nguyen-Son
Seira Hidano
Kazuhide Fukushima
S. Kiyomoto
AAML
100
0
0
12 Oct 2021
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
LightSeq2: Accelerated Training for Transformer-based Models on GPUs
Xiaohui Wang
Yang Wei
Ying Xiong
Guyue Huang
Xian Qian
Yufei Ding
Mingxuan Wang
Lei Li
VLM
62
33
0
12 Oct 2021
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign
  Language Recognition
SignBERT: Pre-Training of Hand-Model-Aware Representation for Sign Language Recognition
Hezhen Hu
Weichao Zhao
Wen-gang Zhou
Yuechen Wang
Houqiang Li
ViT
79
71
0
11 Oct 2021
Supervision Exists Everywhere: A Data Efficient Contrastive
  Language-Image Pre-training Paradigm
Supervision Exists Everywhere: A Data Efficient Contrastive Language-Image Pre-training Paradigm
Yangguang Li
Feng Liang
Lichen Zhao
Yufeng Cui
Wanli Ouyang
Jing Shao
F. Yu
Junjie Yan
VLMCLIP
167
458
0
11 Oct 2021
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Pre-trained Language Models in Biomedical Domain: A Systematic Survey
Benyou Wang
Qianqian Xie
Jiahuan Pei
Zhihong Chen
Prayag Tiwari
Zhao Li
Jie Fu
LM&MAAI4CE
154
171
0
11 Oct 2021
Advances in Multi-turn Dialogue Comprehension: A Survey
Zhuosheng Zhang
Hai Zhao
105
21
0
11 Oct 2021
DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence
DCT: Dynamic Compressive Transformer for Modeling Unbounded Sequence
Kai-Po Chang
Wei-Yun Ma
21
0
0
10 Oct 2021
RPT: Toward Transferable Model on Heterogeneous Researcher Data via
  Pre-Training
RPT: Toward Transferable Model on Heterogeneous Researcher Data via Pre-Training
Ziyue Qiao
Yanjie Fu
Pengyang Wang
Meng Xiao
Zhiyuan Ning
Denghui Zhang
Yi Du
Yuanchun Zhou
99
13
0
08 Oct 2021
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Speeding up Deep Model Training by Sharing Weights and Then Unsharing
Shuo Yang
Le Hou
Xiaodan Song
Qiang Liu
Denny Zhou
150
9
0
08 Oct 2021
Using Keypoint Matching and Interactive Self Attention Network to verify
  Retail POSMs
Using Keypoint Matching and Interactive Self Attention Network to verify Retail POSMs
Harshita Seth
Sonaal Kant
Muktabh Mayank Srivastava
53
1
0
07 Oct 2021
Noisy Text Data: Achilles' Heel of popular transformer based NLP models
Noisy Text Data: Achilles' Heel of popular transformer based NLP models
Kartikay Bagla
Ankit Kumar
Shivam Gupta
Anuj Gupta
55
6
0
07 Oct 2021
A Comparative Study of Transformer-Based Language Models on Extractive
  Question Answering
A Comparative Study of Transformer-Based Language Models on Extractive Question Answering
Kate Pearce
Tiffany Zhan
Aneesh Komanduri
J. Zhan
ELM
87
34
0
07 Oct 2021
Capturing Structural Locality in Non-parametric Language Models
Capturing Structural Locality in Non-parametric Language Models
Frank F. Xu
Junxian He
Graham Neubig
Vincent J. Hellendoorn
112
14
0
06 Oct 2021
KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier
KNN-BERT: Fine-Tuning Pre-Trained Models with KNN Classifier
Linyang Li
Demin Song
Ruotian Ma
Xipeng Qiu
Xuanjing Huang
114
21
0
06 Oct 2021
Using Psuedolabels for training Sentiment Classifiers makes the model
  generalize better across datasets
Using Psuedolabels for training Sentiment Classifiers makes the model generalize better across datasets
N. Reddy
Muktabh Mayank Srivastava
26
0
0
05 Oct 2021
Previous
123...363738...697071
Next