ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1909.11942
  4. Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
v1v2v3v4v5v6 (latest)

ALBERT: A Lite BERT for Self-supervised Learning of Language Representations

26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
    SSLAIMat
ArXiv (abs)PDFHTMLGithub (3271★)

Papers citing "ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"

50 / 2,935 papers shown
Title
Cascade Neural Ensemble for Identifying Scientifically Sound Articles
Cascade Neural Ensemble for Identifying Scientifically Sound Articles
Ashwin Karthik Ambalavanan
M. Devarakonda
38
1
0
13 Apr 2020
Robustly Pre-trained Neural Model for Direct Temporal Relation
  Extraction
Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction
Hong Guan
Jianfu Li
Hua Xu
M. Devarakonda
15
11
0
13 Apr 2020
Pretrained Transformers Improve Out-of-Distribution Robustness
Pretrained Transformers Improve Out-of-Distribution Robustness
Dan Hendrycks
Xiaoyuan Liu
Eric Wallace
Adam Dziedzic
R. Krishnan
Basel Alomair
OOD
221
436
0
13 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
110
388
0
13 Apr 2020
Explaining Question Answering Models through Text Generation
Explaining Question Answering Models through Text Generation
Veronica Latcinnik
Jonathan Berant
LRM
96
51
0
12 Apr 2020
Multimodal Categorization of Crisis Events in Social Media
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
98
88
0
10 Apr 2020
Designing Precise and Robust Dialogue Response Evaluators
Designing Precise and Robust Dialogue Response Evaluators
Tianyu Zhao
Divesh Lala
Tatsuya Kawahara
57
53
0
10 Apr 2020
Telling BERT's full story: from Local Attention to Global Aggregation
Telling BERT's full story: from Local Attention to Global Aggregation
Damian Pascual
Gino Brunner
Roger Wattenhofer
57
19
0
10 Apr 2020
Injecting Numerical Reasoning Skills into Language Models
Injecting Numerical Reasoning Skills into Language Models
Mor Geva
Ankit Gupta
Jonathan Berant
AIMatLRM
93
227
0
09 Apr 2020
Generating Counter Narratives against Online Hate Speech: Data and
  Strategies
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Serra Sinem Tekiroğlu
Yi-Ling Chung
Marco Guerini
59
112
0
08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
91
323
0
08 Apr 2020
Analyzing Redundancy in Pretrained Transformer Models
Analyzing Redundancy in Pretrained Transformer Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
Yonatan Belinkov
37
2
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
71
143
0
08 Apr 2020
DialBERT: A Hierarchical Pre-Trained Model for Conversation
  Disentanglement
DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement
Tianda Li
Jia-Chen Gu
Xiao-Dan Zhu
Quan Liu
Zhenhua Ling
Zhiming Su
Si Wei
70
28
0
08 Apr 2020
Towards Evaluating the Robustness of Chinese BERT Classifiers
Towards Evaluating the Robustness of Chinese BERT Classifiers
Wei Ping
Boyuan Pan
Xin Li
Yue Liu
AAML
77
8
0
07 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
69
214
0
07 Apr 2020
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for
  Span-based Question Answering
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
Changmao Li
Jinho Choi
51
26
0
07 Apr 2020
A Few Topical Tweets are Enough for Effective User-Level Stance
  Detection
A Few Topical Tweets are Enough for Effective User-Level Stance Detection
Younes Samih
Kareem Darwish
29
7
0
07 Apr 2020
Deep Learning Based Text Classification: A Comprehensive Review
Deep Learning Based Text Classification: A Comprehensive Review
Shervin Minaee
Nal Kalchbrenner
Min Zhang
Narjes Nikzad
M. Asgari-Chenaghlu
Jianfeng Gao
AILawVLMAI4TS
116
1,115
0
06 Apr 2020
Continual Domain-Tuning for Pretrained Language Models
Continual Domain-Tuning for Pretrained Language Models
Subendhu Rongali
Abhyuday N. Jagannatha
Bhanu Pratap Singh Rawat
Hong-ye Yu
CLLKELM
50
7
0
05 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
97
361
0
05 Apr 2020
Finding Black Cat in a Coal Cellar -- Keyphrase Extraction &
  Keyphrase-Rubric Relationship Classification from Complex Assignments
Finding Black Cat in a Coal Cellar -- Keyphrase Extraction & Keyphrase-Rubric Relationship Classification from Complex Assignments
Manikandan Ravikiran
13
0
0
03 Apr 2020
Gestalt: a Stacking Ensemble for SQuAD2.0
Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish
46
4
0
02 Apr 2020
Deep Entity Matching with Pre-Trained Language Models
Deep Entity Matching with Pre-Trained Language Models
Yuliang Li
Jinfeng Li
Yoshihiko Suhara
A. Doan
W. Tan
VLM
108
391
0
01 Apr 2020
Information Leakage in Embedding Models
Information Leakage in Embedding Models
Congzheng Song
A. Raghunathan
MIACV
92
274
0
31 Mar 2020
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
AI4CE
98
24
0
29 Mar 2020
Felix: Flexible Text Editing Through Tagging and Insertion
Felix: Flexible Text Editing Through Tagging and Insertion
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
Guillermo Garrido
82
76
0
24 Mar 2020
Data-driven models and computational tools for neurolinguistics: a
  language technology perspective
Data-driven models and computational tools for neurolinguistics: a language technology perspective
Ekaterina Artemova
Amir Bakarov
A. Artemov
Evgeny Burnaev
M. Sharaev
46
4
0
23 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MAVLM
390
1,498
0
18 Mar 2020
Calibration of Pre-trained Transformers
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
344
302
0
17 Mar 2020
A Survey on Contextual Embeddings
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
276
151
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language
  Understanding
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
40
31
0
16 Mar 2020
A Survey of End-to-End Driving: Architectures and Training Methods
A Survey of End-to-End Driving: Architectures and Training Methods
Ardi Tampuu
Maksym Semikin
Naveed Muhammad
D. Fishman
Tambet Matiisen
3DV
108
238
0
13 Mar 2020
Learning to Encode Position for Transformer with Continuous Dynamical
  Model
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
85
112
0
13 Mar 2020
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual
  Lexical Semantic Similarity
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity
Ivan Vulić
Simon Baker
Edoardo Ponti
Ulla Petti
Ira Leviant
...
Eden Bar
Matt Malone
Thierry Poibeau
Roi Reichart
Anna Korhonen
90
83
0
10 Mar 2020
A Framework for Evaluation of Machine Reading Comprehension Gold
  Standards
A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Viktor Schlegel
Marco Valentino
André Freitas
Goran Nenadic
Riza Batista-Navarro
58
30
0
10 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
162
108
0
05 Mar 2020
Talking-Heads Attention
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
145
80
0
05 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text
  Understanding Models
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
36
94
0
04 Mar 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
162
975
0
28 Feb 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model
  Pre-Training
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
88
397
0
28 Feb 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
  Language Processing
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
75
48
0
28 Feb 2020
On Biased Compression for Distributed Learning
On Biased Compression for Distributed Learning
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
78
189
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
137
1,511
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
134
201
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
138
151
0
26 Feb 2020
Multi-task Learning with Multi-head Attention for Multi-choice Reading
  Comprehension
Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension
H. Wan
122
13
0
26 Feb 2020
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation
  Classification
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
VLMKELM
102
13
0
25 Feb 2020
Exploring BERT Parameter Efficiency on the Stanford Question Answering
  Dataset v2.0
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
Eric Hulburd
53
5
0
25 Feb 2020
Do Multi-Hop Question Answering Systems Know How to Answer the
  Single-Hop Sub-Questions?
Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
Yixuan Tang
Hwee Tou Ng
A. Tung
47
34
0
23 Feb 2020
Previous
123...56575859
Next