Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.11942
Cited By
v1
v2
v3
v4
v5
v6 (latest)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3271★)
Papers citing
"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"
50 / 2,935 papers shown
Title
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
53
16
0
27 Mar 2022
Lite Unified Modeling for Discriminative Reading Comprehension
Yilin Zhao
Hai Zhao
Libin Shen
Yinggong Zhao
86
2
0
26 Mar 2022
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations
Yang Trista Cao
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
Varun Kumar
Jwala Dhamala
Aram Galstyan
74
99
0
25 Mar 2022
A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data
C. Meaney
Wali Hakimpour
S. Kalia
R. Moineddin
37
7
0
25 Mar 2022
Token Dropping for Efficient BERT Pretraining
Le Hou
Richard Yuanzhe Pang
Dinesh Manocha
Yuexin Wu
Xinying Song
Xiaodan Song
Denny Zhou
85
46
0
24 Mar 2022
minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models
Kanishka Misra
84
63
0
24 Mar 2022
Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction
M. Tarnavskyi
Artem Chernodub
Kostiantyn Omelianchuk
3DV
59
26
0
24 Mar 2022
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
95
68
0
24 Mar 2022
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization
Kecheng Zheng
Yang Cao
Kai Zhu
Ruijing Zhao
Zhengjun Zha
80
6
0
24 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
67
4
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
120
6
0
23 Mar 2022
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention
Yang Liu
Jiaxiang Liu
L. Chen
Yuxiang Lu
Shi Feng
Zhida Feng
Yu Sun
Hao Tian
Huancheng Wu
Hai-feng Wang
70
9
0
23 Mar 2022
Transformer based ensemble for emotion detection
Aditya Kane
Shantanu Patankar
Sahil Khose
Neeraja Kirtane
73
10
0
22 Mar 2022
Reinforcement-based frugal learning for satellite image change detection
Sebastien Deschamps
H. Sahbi
69
1
0
22 Mar 2022
Frugal Learning of Virtual Exemplars for Label-Efficient Satellite Image Change Detection
H. Sahbi
Sebastien Deschamps
60
0
0
22 Mar 2022
Task-guided Disentangled Tuning for Pretrained Language Models
Jiali Zeng
Yu Jiang
Shuangzhi Wu
Yongjing Yin
Mu Li
DRL
150
3
0
22 Mar 2022
Language modeling via stochastic processes
Rose E. Wang
Esin Durmus
Noah D. Goodman
Tatsunori Hashimoto
BDL
AI4TS
117
25
0
21 Mar 2022
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
Zheng Li
Zijian Wang
Ming Tan
Ramesh Nallapati
Parminder Bhatia
Andrew O. Arnold
Bing Xiang
Dan Roth
MQ
78
44
0
21 Mar 2022
Masked Discrimination for Self-Supervised Learning on Point Clouds
Haotian Liu
Mu Cai
Yong Jae Lee
3DPC
126
172
0
21 Mar 2022
TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural Language Processing
Mucheng Ren
Heyan Huang
Yuxiang Zhou
Qianwen Cao
Yu Bu
Yang Gao
53
11
0
21 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
80
104
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
92
5
0
20 Mar 2022
Cluster & Tune: Boost Cold Start Performance in Text Classification
Eyal Shnarch
Ariel Gera
Alon Halfon
Lena Dankin
Leshem Choshen
R. Aharonov
Noam Slonim
67
22
0
20 Mar 2022
How does the pre-training objective affect what large language models learn about linguistic properties?
Ahmed Alajrami
Nikolaos Aletras
82
20
0
20 Mar 2022
Clickbait Spoiling via Question Answering and Passage Retrieval
Matthias Hagen
Maik Fröbe
Artur Jurk
Martin Potthast
92
35
0
19 Mar 2022
Learning Compressed Embeddings for On-Device Inference
Niketan Pansare
J. Katukuri
Aditya Arora
F. Cipollone
R. Shaik
Noyan Tokgozoglu
Chandru Venkataraman
101
15
0
18 Mar 2022
elBERto: Self-supervised Commonsense Learning for Question Answering
Xunlin Zhan
Yuan Li
Xiao Dong
Xiaodan Liang
Zhiting Hu
Lawrence Carin
SSL
RALM
LRM
77
8
0
17 Mar 2022
Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss
Yantao Gong
Cao Liu
Fan Yang
Xunliang Cai
Guanglu Wan
Jiansong Chen
Weipeng Zhang
Houfeng Wang
UQCV
60
2
0
17 Mar 2022
RoMe: A Robust Metric for Evaluating Natural Language Generation
Md. Rony
Liubov Kovriguina
Debanjan Chaudhuri
Ricardo Usbeck
Jens Lehmann
75
12
0
17 Mar 2022
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Bei Li
Quan Du
Tao Zhou
Yi Jing
Shuhan Zhou
Xin Zeng
Tong Xiao
JingBo Zhu
Xuebo Liu
Min Zhang
59
35
0
17 Mar 2022
EDTER: Edge Detection with Transformer
Mengyang Pu
Yaping Huang
Yuming Liu
Q. Guan
Haibin Ling
ViT
105
101
0
16 Mar 2022
Unified Visual Transformer Compression
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zhangyang Wang
ViT
94
94
0
15 Mar 2022
SCD: Self-Contrastive Decorrelation for Sentence Embeddings
T. Klein
Moin Nabi
SSL
54
26
0
15 Mar 2022
Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning
Qinjie Lin
Han Liu
B. Sengupta
OffRL
72
12
0
14 Mar 2022
WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named Entity Recognition
Renjie Zhou
Qian Hu
Jian Wan
Jilin Zhang
Qiang Liu
Tianxiang Hu
Jian Li
59
3
0
14 Mar 2022
PERT: Pre-training BERT with Permuted Language Model
Yiming Cui
Ziqing Yang
Ting Liu
85
37
0
14 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
91
110
0
14 Mar 2022
Can pre-trained Transformers be used in detecting complex sensitive sentences? -- A Monsanto case study
Roelien C. Timmer
David Liebowitz
Surya Nepal
S. Kanhere
50
8
0
14 Mar 2022
BiBERT: Accurate Fully Binarized BERT
Haotong Qin
Yifu Ding
Mingyuan Zhang
Qing Yan
Aishan Liu
Qingqing Dang
Ziwei Liu
Xianglong Liu
MQ
71
95
0
12 Mar 2022
Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers
Stefan Haller
Adina Aldea
C. Seifert
N. Strisciuglio
67
39
0
11 Mar 2022
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
M. Grootendorst
173
1,513
0
11 Mar 2022
MVP: Multimodality-guided Visual Pre-training
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
88
108
0
10 Mar 2022
Speciesist Language and Nonhuman Animal Bias in English Masked Language Models
Masashi Takeshita
Rafal Rzepka
K. Araki
88
7
0
10 Mar 2022
PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Transformers for Patronizing and Condescending Language Detection
Dou Hu
Mengyuan Zhou
Xiyang Du
Mengfei Yuan
Meizhi Jin
Lian-Xin Jiang
Yang Mo
Xiaofeng Shi
43
7
0
09 Mar 2022
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
Neeraj Varshney
Swaroop Mishra
Chitta Baral
69
19
0
07 Mar 2022
Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents
Yicheng Zou
Hongwei Liu
Tao Gui
Junzhe Wang
Qi Zhang
M. Tang
Haixiang Li
Dan Wang
DRL
103
31
0
06 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
85
41
0
03 Mar 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
102
109
0
02 Mar 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
66
26
0
02 Mar 2022
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman
Furkan Şahinuç
E. Yilmaz
126
63
0
02 Mar 2022
Previous
1
2
3
...
31
32
33
...
57
58
59
Next