ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,935 papers shown
Title
RefBERT: Compressing BERT by Referencing to Pre-computed Representations
RefBERT: Compressing BERT by Referencing to Pre-computed Representations
Xinyi Wang
Haiqing Yang
Liang Zhao
Yang Mo
Jianping Shen
MQ
72
4
0
11 Jun 2021
Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for
  Multimodal Hate
Deciphering Implicit Hate: Evaluating Automated Detection Algorithms for Multimodal Hate
Austin Botelho
Bertie Vidgen
Scott A. Hale
61
9
0
10 Jun 2021
A Semi-supervised Multi-task Learning Approach to Classify Customer
  Contact Intents
A Semi-supervised Multi-task Learning Approach to Classify Customer Contact Intents
Li Dong
Matthew C. Spencer
Amir Biagi
50
3
0
10 Jun 2021
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped
  Structures
GroupBERT: Enhanced Transformer Architecture with Efficient Grouped Structures
Ivan Chelombiev
Daniel Justus
Douglas Orr
A. Dietrich
Frithjof Gressmann
A. Koliousis
Carlo Luschi
60
5
0
10 Jun 2021
CAT: Cross Attention in Vision Transformer
CAT: Cross Attention in Vision Transformer
Hezheng Lin
Xingyi Cheng
Xiangyu Wu
Fan Yang
Dong Shen
Zhongyuan Wang
Qing Song
Wei Yuan
ViT
65
158
0
10 Jun 2021
Linguistically Informed Masking for Representation Learning in the
  Patent Domain
Linguistically Informed Masking for Representation Learning in the Patent Domain
Sophia Althammer
Mark Buckley
Sebastian Hofstatter
Allan Hanbury
59
11
0
10 Jun 2021
Low-Dimensional Structure in the Space of Language Representations is
  Reflected in Brain Responses
Low-Dimensional Structure in the Space of Language Representations is Reflected in Brain Responses
Richard Antonello
Javier S. Turek
Vy A. Vo
Alexander G. Huth
83
42
0
09 Jun 2021
Eye of the Beholder: Improved Relation Generalization for Text-based
  Reinforcement Learning Agents
Eye of the Beholder: Improved Relation Generalization for Text-based Reinforcement Learning Agents
K. Murugesan
Subhajit Chaudhury
Kartik Talamadupula
99
5
0
09 Jun 2021
Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in
  Public Cloud
Cocktail: Leveraging Ensemble Learning for Optimized Model Serving in Public Cloud
Jashwant Raj Gunasekaran
Cyan Subhra Mishra
P. Thinakaran
M. Kandemir
Chita R. Das
35
3
0
09 Jun 2021
Bayesian Attention Belief Networks
Bayesian Attention Belief Networks
Shujian Zhang
Xinjie Fan
Bo Chen
Mingyuan Zhou
BDL
110
32
0
09 Jun 2021
Key Information Extraction From Documents: Evaluation And Generator
Key Information Extraction From Documents: Evaluation And Generator
Oliver Bensch
Mirela C. Popa
Constantin Spille
42
14
0
09 Jun 2021
Automatic Sexism Detection with Multilingual Transformer Models
Automatic Sexism Detection with Multilingual Transformer Models
Mina Schütz
Jaqueline Boeck
Daria Liakhovets
D. Slijepcevic
Armin Kirchknopf
Manuel Hecht
Johannes Bogensperger
S. Schlarb
Alexander Schindler
Matthias Zeppelzauer
34
29
0
09 Jun 2021
TIMEDIAL: Temporal Commonsense Reasoning in Dialog
TIMEDIAL: Temporal Commonsense Reasoning in Dialog
Lianhui Qin
Aditya Gupta
Shyam Upadhyay
Luheng He
Yejin Choi
Manaal Faruqui
LRM
98
72
0
08 Jun 2021
Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with
  Recurrent Networks
Can You Learn an Algorithm? Generalizing from Easy to Hard Problems with Recurrent Networks
Avi Schwarzschild
Eitan Borgnia
Arjun Gupta
Furong Huang
U. Vishkin
Micah Goldblum
Tom Goldstein
103
77
0
08 Jun 2021
CLTR: An End-to-End, Transformer-Based System for Cell Level Table
  Retrieval and Table Question Answering
CLTR: An End-to-End, Transformer-Based System for Cell Level Table Retrieval and Table Question Answering
FeiFei Pan
Mustafa Canim
Michael R. Glass
A. Gliozzo
Peter Fox
VLMLMTD
58
26
0
08 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
84
11
0
08 Jun 2021
Exploiting Language Relatedness for Low Web-Resource Language Model
  Adaptation: An Indic Languages Study
Exploiting Language Relatedness for Low Web-Resource Language Model Adaptation: An Indic Languages Study
Yash Khemchandani
Sarvesh Mehtani
Vaidehi Patil
Abhijeet Awasthi
Partha P. Talukdar
Sunita Sarawagi
76
32
0
07 Jun 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the
  Order of Reasoning
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
76
28
0
07 Jun 2021
Refiner: Refining Self-attention for Vision Transformers
Refiner: Refining Self-attention for Vision Transformers
Daquan Zhou
Yujun Shi
Bingyi Kang
Weihao Yu
Zihang Jiang
Yuan Li
Xiaojie Jin
Qibin Hou
Jiashi Feng
ViT
96
62
0
07 Jun 2021
PROST: Physical Reasoning of Objects through Space and Time
PROST: Physical Reasoning of Objects through Space and Time
Stéphane Aroca-Ouellette
Cory Paik
Alessandro Roncone
Katharina Kann
LRM
80
49
0
07 Jun 2021
RoSearch: Search for Robust Student Architectures When Distilling
  Pre-trained Language Models
RoSearch: Search for Robust Student Architectures When Distilling Pre-trained Language Models
Xin Guo
Jianlei Yang
Haoyi Zhou
Xucheng Ye
Jianxin Li
52
1
0
07 Jun 2021
Relative Importance in Sentence Processing
Relative Importance in Sentence Processing
Nora Hollenstein
Lisa Beinborn
FAtt
82
32
0
07 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
209
342
0
07 Jun 2021
Understand and Improve Contrastive Learning Methods for Visual
  Representation: A Review
Understand and Improve Contrastive Learning Methods for Visual Representation: A Review
Ran Liu
SSL
62
12
0
06 Jun 2021
Transient Chaos in BERT
Transient Chaos in BERT
Katsuma Inoue
Soh Ohara
Yasuo Kuniyoshi
Kohei Nakajima
55
3
0
06 Jun 2021
MergeDistill: Merging Pre-trained Language Models using Distillation
MergeDistill: Merging Pre-trained Language Models using Distillation
Simran Khanuja
Melvin Johnson
Partha P. Talukdar
84
16
0
05 Jun 2021
Meta-Learning with Fewer Tasks through Task Interpolation
Meta-Learning with Fewer Tasks through Task Interpolation
Huaxiu Yao
Linjun Zhang
Chelsea Finn
114
56
0
04 Jun 2021
BERT-Based Sentiment Analysis: A Software Engineering Perspective
BERT-Based Sentiment Analysis: A Software Engineering Perspective
Himanshu Batra
Narinder Singh Punn
S. K. Sonbhadra
Sonali Agarwal
107
36
0
04 Jun 2021
You Only Compress Once: Towards Effective and Elastic BERT Compression
  via Exploit-Explore Stochastic Nature Gradient
You Only Compress Once: Towards Effective and Elastic BERT Compression via Exploit-Explore Stochastic Nature Gradient
Shaokun Zhang
Xiawu Zheng
Chenyi Yang
Yuchao Li
Yan Wang
Yong Li
Mengdi Wang
Shen Li
Jun Yang
Rongrong Ji
MQ
97
23
0
04 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained
  Transformer Compression
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
81
13
0
04 Jun 2021
Enabling Lightweight Fine-tuning for Pre-trained Language Model
  Compression based on Matrix Product Operators
Enabling Lightweight Fine-tuning for Pre-trained Language Model Compression based on Matrix Product Operators
Peiyu Liu
Ze-Feng Gao
Wayne Xin Zhao
Z. Xie
Zhong-Yi Lu
Ji-Rong Wen
40
30
0
04 Jun 2021
Self-supervised Dialogue Learning for Spoken Conversational Question
  Answering
Self-supervised Dialogue Learning for Spoken Conversational Question Answering
Nuo Chen
Chenyu You
Yuexian Zou
SSL
91
34
0
04 Jun 2021
The Case for Translation-Invariant Self-Attention in Transformer-Based
  Language Models
The Case for Translation-Invariant Self-Attention in Transformer-Based Language Models
Ulme Wennberg
G. Henter
MILM
93
22
0
03 Jun 2021
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual
  Learning
E2E-VLP: End-to-End Vision-Language Pre-training Enhanced by Visual Learning
Haiyang Xu
Ming Yan
Chenliang Li
Bin Bi
Songfang Huang
Wenming Xiao
Fei Huang
VLM
113
119
0
03 Jun 2021
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel
  Machines
Transformers are Deep Infinite-Dimensional Non-Mercer Binary Kernel Machines
Matthew A. Wright
Joseph E. Gonzalez
84
23
0
02 Jun 2021
Towards Deeper Deep Reinforcement Learning with Spectral Normalization
Towards Deeper Deep Reinforcement Learning with Spectral Normalization
Johan Bjorck
Carla P. Gomes
Kilian Q. Weinberger
93
23
0
02 Jun 2021
A Multi-Level Attention Model for Evidence-Based Fact Checking
A Multi-Level Attention Model for Evidence-Based Fact Checking
Canasai Kruengkrai
Junichi Yamagishi
Xin Wang
GNN
52
26
0
02 Jun 2021
Conversational Question Answering: A Survey
Conversational Question Answering: A Survey
Munazza Zaib
Wei Emma Zhang
Quan Z. Sheng
A. Mahmood
Yang Zhang
89
91
0
02 Jun 2021
Claim Matching Beyond English to Scale Global Fact-Checking
Claim Matching Beyond English to Scale Global Fact-Checking
Ashkan Kazemi
Kiran Garimella
Devin Gaffney
Scott A. Hale
77
60
0
01 Jun 2021
Comparing Test Sets with Item Response Theory
Comparing Test Sets with Item Response Theory
Clara Vania
Phu Mon Htut
William Huang
Dhara Mungra
Richard Yuanzhe Pang
Jason Phang
Haokun Liu
Kyunghyun Cho
Sam Bowman
77
43
0
01 Jun 2021
What Ingredients Make for an Effective Crowdsourcing Protocol for
  Difficult NLU Data Collection Tasks?
What Ingredients Make for an Effective Crowdsourcing Protocol for Difficult NLU Data Collection Tasks?
Nikita Nangia
Saku Sugawara
H. Trivedi
Alex Warstadt
Clara Vania
Sam Bowman
136
36
0
01 Jun 2021
Dialogue-oriented Pre-training
Dialogue-oriented Pre-training
Yi Xu
Hai Zhao
73
14
0
01 Jun 2021
Sub-Character Tokenization for Chinese Pretrained Language Models
Sub-Character Tokenization for Chinese Pretrained Language Models
Chenglei Si
Zhengyan Zhang
Yingfa Chen
Fanchao Qi
Xiaozhi Wang
Zhiyuan Liu
Yasheng Wang
Qun Liu
Maosong Sun
53
12
0
01 Jun 2021
Improving the Adversarial Robustness for Speaker Verification by
  Self-Supervised Learning
Improving the Adversarial Robustness for Speaker Verification by Self-Supervised Learning
Haibin Wu
Xu Li
Andy T. Liu
Zhiyong Wu
Helen Meng
Hung-yi Lee
AAMLSSL
116
30
0
01 Jun 2021
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA
  Models
Adversarial VQA: A New Benchmark for Evaluating the Robustness of VQA Models
Linjie Li
Jie Lei
Zhe Gan
Jingjing Liu
AAMLVLM
114
75
0
01 Jun 2021
Choose a Transformer: Fourier or Galerkin
Choose a Transformer: Fourier or Galerkin
Shuhao Cao
90
256
0
31 May 2021
SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning
SemEval-2021 Task 4: Reading Comprehension of Abstract Meaning
Boyuan Zheng
Xiaoyu Yang
Yu-Ping Ruan
Zhen-Hua Ling
Quan Liu
Si Wei
Xiao-Dan Zhu
ELM
44
13
0
31 May 2021
LEAP: Learnable Pruning for Transformer-based Models
LEAP: Learnable Pruning for Transformer-based Models
Z. Yao
Xiaoxia Wu
Linjian Ma
Sheng Shen
Kurt Keutzer
Michael W. Mahoney
Yuxiong He
62
7
0
30 May 2021
Neural Models for Offensive Language Detection
Neural Models for Offensive Language Detection
Ehab Hamdy
31
4
0
30 May 2021
Pre-training Universal Language Representation
Pre-training Universal Language Representation
Yian Li
Hai Zhao
SSL
62
8
0
30 May 2021
Previous
123...414243...575859
Next