ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,935 papers shown
Title
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
53
16
0
27 Mar 2022
Lite Unified Modeling for Discriminative Reading Comprehension
Lite Unified Modeling for Discriminative Reading Comprehension
Yilin Zhao
Hai Zhao
Libin Shen
Yinggong Zhao
86
2
0
26 Mar 2022
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for
  Contextualized Language Representations
On the Intrinsic and Extrinsic Fairness Evaluation Metrics for Contextualized Language Representations
Yang Trista Cao
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
Varun Kumar
Jwala Dhamala
Aram Galstyan
74
99
0
25 Mar 2022
A Comparative Evaluation Of Transformer Models For De-Identification Of
  Clinical Text Data
A Comparative Evaluation Of Transformer Models For De-Identification Of Clinical Text Data
C. Meaney
Wali Hakimpour
S. Kalia
R. Moineddin
37
7
0
25 Mar 2022
Token Dropping for Efficient BERT Pretraining
Token Dropping for Efficient BERT Pretraining
Le Hou
Richard Yuanzhe Pang
Dinesh Manocha
Yuexin Wu
Xinying Song
Xiaodan Song
Denny Zhou
85
46
0
24 Mar 2022
minicons: Enabling Flexible Behavioral and Representational Analyses of
  Transformer Language Models
minicons: Enabling Flexible Behavioral and Representational Analyses of Transformer Language Models
Kanishka Misra
84
63
0
24 Mar 2022
Ensembling and Knowledge Distilling of Large Sequence Taggers for
  Grammatical Error Correction
Ensembling and Knowledge Distilling of Large Sequence Taggers for Grammatical Error Correction
M. Tarnavskyi
Artem Chernodub
Kostiantyn Omelianchuk
3DV
59
26
0
24 Mar 2022
Generating Data to Mitigate Spurious Correlations in Natural Language
  Inference Datasets
Generating Data to Mitigate Spurious Correlations in Natural Language Inference Datasets
Yuxiang Wu
Matt Gardner
Pontus Stenetorp
Pradeep Dasigi
95
68
0
24 Mar 2022
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization
FAMLP: A Frequency-Aware MLP-Like Architecture For Domain Generalization
Kecheng Zheng
Yang Cao
Kai Zhu
Ruijing Zhao
Zhengjun Zha
80
6
0
24 Mar 2022
Bilaterally Slimmable Transformer for Elastic and Efficient Visual
  Question Answering
Bilaterally Slimmable Transformer for Elastic and Efficient Visual Question Answering
Zhou Yu
Zitian Jin
Jun Yu
Mingliang Xu
Hongbo Wang
Jianping Fan
67
4
0
24 Mar 2022
Linearizing Transformer with Key-Value Memory
Linearizing Transformer with Key-Value Memory
Yizhe Zhang
Deng Cai
120
6
0
23 Mar 2022
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through
  Regularized Self-Attention
ERNIE-SPARSE: Learning Hierarchical Efficient Transformer Through Regularized Self-Attention
Yang Liu
Jiaxiang Liu
L. Chen
Yuxiang Lu
Shi Feng
Zhida Feng
Yu Sun
Hao Tian
Huancheng Wu
Hai-feng Wang
70
9
0
23 Mar 2022
Transformer based ensemble for emotion detection
Transformer based ensemble for emotion detection
Aditya Kane
Shantanu Patankar
Sahil Khose
Neeraja Kirtane
73
10
0
22 Mar 2022
Reinforcement-based frugal learning for satellite image change detection
Reinforcement-based frugal learning for satellite image change detection
Sebastien Deschamps
H. Sahbi
69
1
0
22 Mar 2022
Frugal Learning of Virtual Exemplars for Label-Efficient Satellite Image
  Change Detection
Frugal Learning of Virtual Exemplars for Label-Efficient Satellite Image Change Detection
H. Sahbi
Sebastien Deschamps
60
0
0
22 Mar 2022
Task-guided Disentangled Tuning for Pretrained Language Models
Task-guided Disentangled Tuning for Pretrained Language Models
Jiali Zeng
Yu Jiang
Shuangzhi Wu
Yongjing Yin
Mu Li
DRL
150
3
0
22 Mar 2022
Language modeling via stochastic processes
Language modeling via stochastic processes
Rose E. Wang
Esin Durmus
Noah D. Goodman
Tatsunori Hashimoto
BDLAI4TS
117
25
0
21 Mar 2022
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and
  Quantization
DQ-BART: Efficient Sequence-to-Sequence Model via Joint Distillation and Quantization
Zheng Li
Zijian Wang
Ming Tan
Ramesh Nallapati
Parminder Bhatia
Andrew O. Arnold
Bing Xiang
Dan Roth
MQ
78
44
0
21 Mar 2022
Masked Discrimination for Self-Supervised Learning on Point Clouds
Masked Discrimination for Self-Supervised Learning on Point Clouds
Haotian Liu
Mu Cai
Yong Jae Lee
3DPC
126
172
0
21 Mar 2022
TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural
  Language Processing
TCM-SD: A Benchmark for Probing Syndrome Differentiation via Natural Language Processing
Mucheng Ren
Heyan Huang
Yuxiang Zhou
Qianwen Cao
Yu Bu
Yang Gao
53
11
0
21 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
80
104
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
92
5
0
20 Mar 2022
Cluster & Tune: Boost Cold Start Performance in Text Classification
Cluster & Tune: Boost Cold Start Performance in Text Classification
Eyal Shnarch
Ariel Gera
Alon Halfon
Lena Dankin
Leshem Choshen
R. Aharonov
Noam Slonim
67
22
0
20 Mar 2022
How does the pre-training objective affect what large language models
  learn about linguistic properties?
How does the pre-training objective affect what large language models learn about linguistic properties?
Ahmed Alajrami
Nikolaos Aletras
82
20
0
20 Mar 2022
Clickbait Spoiling via Question Answering and Passage Retrieval
Clickbait Spoiling via Question Answering and Passage Retrieval
Matthias Hagen
Maik Fröbe
Artur Jurk
Martin Potthast
92
35
0
19 Mar 2022
Learning Compressed Embeddings for On-Device Inference
Learning Compressed Embeddings for On-Device Inference
Niketan Pansare
J. Katukuri
Aditya Arora
F. Cipollone
R. Shaik
Noyan Tokgozoglu
Chandru Venkataraman
101
15
0
18 Mar 2022
elBERto: Self-supervised Commonsense Learning for Question Answering
elBERto: Self-supervised Commonsense Learning for Question Answering
Xunlin Zhan
Yuan Li
Xiao Dong
Xiaodan Liang
Zhiting Hu
Lawrence Carin
SSLRALMLRM
77
8
0
17 Mar 2022
Confidence Calibration for Intent Detection via Hyperspherical Space and
  Rebalanced Accuracy-Uncertainty Loss
Confidence Calibration for Intent Detection via Hyperspherical Space and Rebalanced Accuracy-Uncertainty Loss
Yantao Gong
Cao Liu
Fan Yang
Xunliang Cai
Guanglu Wan
Jiansong Chen
Weipeng Zhang
Houfeng Wang
UQCV
60
2
0
17 Mar 2022
RoMe: A Robust Metric for Evaluating Natural Language Generation
RoMe: A Robust Metric for Evaluating Natural Language Generation
Md. Rony
Liubov Kovriguina
Debanjan Chaudhuri
Ricardo Usbeck
Jens Lehmann
75
12
0
17 Mar 2022
ODE Transformer: An Ordinary Differential Equation-Inspired Model for
  Sequence Generation
ODE Transformer: An Ordinary Differential Equation-Inspired Model for Sequence Generation
Bei Li
Quan Du
Tao Zhou
Yi Jing
Shuhan Zhou
Xin Zeng
Tong Xiao
JingBo Zhu
Xuebo Liu
Min Zhang
59
35
0
17 Mar 2022
EDTER: Edge Detection with Transformer
EDTER: Edge Detection with Transformer
Mengyang Pu
Yaping Huang
Yuming Liu
Q. Guan
Haibin Ling
ViT
105
101
0
16 Mar 2022
Unified Visual Transformer Compression
Unified Visual Transformer Compression
Shixing Yu
Tianlong Chen
Jiayi Shen
Huan Yuan
Jianchao Tan
Sen Yang
Ji Liu
Zhangyang Wang
ViT
94
94
0
15 Mar 2022
SCD: Self-Contrastive Decorrelation for Sentence Embeddings
SCD: Self-Contrastive Decorrelation for Sentence Embeddings
T. Klein
Moin Nabi
SSL
54
26
0
15 Mar 2022
Switch Trajectory Transformer with Distributional Value Approximation
  for Multi-Task Reinforcement Learning
Switch Trajectory Transformer with Distributional Value Approximation for Multi-Task Reinforcement Learning
Qinjie Lin
Han Liu
B. Sengupta
OffRL
72
12
0
14 Mar 2022
WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named
  Entity Recognition
WCL-BBCD: A Contrastive Learning and Knowledge Graph Approach to Named Entity Recognition
Renjie Zhou
Qian Hu
Jian Wan
Jilin Zhang
Qiang Liu
Tianxiang Hu
Jian Li
59
3
0
14 Mar 2022
PERT: Pre-training BERT with Permuted Language Model
PERT: Pre-training BERT with Permuted Language Model
Yiming Cui
Ziqing Yang
Ting Liu
85
37
0
14 Mar 2022
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark
  for Semantic and Generative Capabilities
SUPERB-SG: Enhanced Speech processing Universal PERformance Benchmark for Semantic and Generative Capabilities
Hsiang-Sheng Tsai
Heng-Jui Chang
Wen-Chin Huang
Zili Huang
Kushal Lakhotia
...
Hsuan-Jui Chen
Shang-Wen Li
Shinji Watanabe
Abdel-rahman Mohamed
Hung-yi Lee
91
110
0
14 Mar 2022
Can pre-trained Transformers be used in detecting complex sensitive
  sentences? -- A Monsanto case study
Can pre-trained Transformers be used in detecting complex sensitive sentences? -- A Monsanto case study
Roelien C. Timmer
David Liebowitz
Surya Nepal
S. Kanhere
50
8
0
14 Mar 2022
BiBERT: Accurate Fully Binarized BERT
BiBERT: Accurate Fully Binarized BERT
Haotong Qin
Yifu Ding
Mingyuan Zhang
Qing Yan
Aishan Liu
Qingqing Dang
Ziwei Liu
Xianglong Liu
MQ
71
95
0
12 Mar 2022
Survey on Automated Short Answer Grading with Deep Learning: from Word
  Embeddings to Transformers
Survey on Automated Short Answer Grading with Deep Learning: from Word Embeddings to Transformers
Stefan Haller
Adina Aldea
C. Seifert
N. Strisciuglio
67
39
0
11 Mar 2022
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
BERTopic: Neural topic modeling with a class-based TF-IDF procedure
M. Grootendorst
173
1,513
0
11 Mar 2022
MVP: Multimodality-guided Visual Pre-training
MVP: Multimodality-guided Visual Pre-training
Longhui Wei
Lingxi Xie
Wen-gang Zhou
Houqiang Li
Qi Tian
88
108
0
10 Mar 2022
Speciesist Language and Nonhuman Animal Bias in English Masked Language
  Models
Speciesist Language and Nonhuman Animal Bias in English Masked Language Models
Masashi Takeshita
Rafal Rzepka
K. Araki
88
7
0
10 Mar 2022
PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of
  Transformers for Patronizing and Condescending Language Detection
PALI-NLP at SemEval-2022 Task 4: Discriminative Fine-tuning of Transformers for Patronizing and Condescending Language Detection
Dou Hu
Mengyuan Zhou
Xiyang Du
Mengfei Yuan
Meizhi Jin
Lian-Xin Jiang
Yang Mo
Xiaofeng Shi
43
7
0
09 Mar 2022
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
ILDAE: Instance-Level Difficulty Analysis of Evaluation Data
Neeraj Varshney
Swaroop Mishra
Chitta Baral
69
19
0
07 Mar 2022
Divide and Conquer: Text Semantic Matching with Disentangled Keywords
  and Intents
Divide and Conquer: Text Semantic Matching with Disentangled Keywords and Intents
Yicheng Zou
Hongwei Liu
Tao Gui
Junzhe Wang
Qi Zhang
M. Tang
Haixiang Li
Dan Wang
DRL
103
31
0
06 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding
  and Generation
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
85
41
0
03 Mar 2022
Audio Self-supervised Learning: A Survey
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
102
109
0
02 Mar 2022
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
DCT-Former: Efficient Self-Attention with Discrete Cosine Transform
Carmelo Scribano
Giorgia Franchini
M. Prato
Marko Bertogna
66
26
0
02 Mar 2022
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Large-Scale Hate Speech Detection with Cross-Domain Transfer
Cagri Toraman
Furkan Şahinuç
E. Yilmaz
126
63
0
02 Mar 2022
Previous
123...313233...575859
Next