Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.11942
Cited By
v1
v2
v3
v4
v5
v6 (latest)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3271★)
Papers citing
"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"
50 / 2,935 papers shown
Title
Cascade Neural Ensemble for Identifying Scientifically Sound Articles
Ashwin Karthik Ambalavanan
M. Devarakonda
38
1
0
13 Apr 2020
Robustly Pre-trained Neural Model for Direct Temporal Relation Extraction
Hong Guan
Jianfu Li
Hua Xu
M. Devarakonda
15
11
0
13 Apr 2020
Pretrained Transformers Improve Out-of-Distribution Robustness
Dan Hendrycks
Xiaoyuan Liu
Eric Wallace
Adam Dziedzic
R. Krishnan
Basel Alomair
OOD
221
436
0
13 Apr 2020
CLUE: A Chinese Language Understanding Evaluation Benchmark
Liang Xu
Hai Hu
Xuanwei Zhang
Lu Li
Chenjie Cao
...
Cong Yue
Xinrui Zhang
Zhen-Yi Yang
Kyle Richardson
Zhenzhong Lan
ELM
110
388
0
13 Apr 2020
Explaining Question Answering Models through Text Generation
Veronica Latcinnik
Jonathan Berant
LRM
96
51
0
12 Apr 2020
Multimodal Categorization of Crisis Events in Social Media
Mahdi Abavisani
Liwei Wu
Shengli Hu
Joel R. Tetreault
A. Jaimes
98
88
0
10 Apr 2020
Designing Precise and Robust Dialogue Response Evaluators
Tianyu Zhao
Divesh Lala
Tatsuya Kawahara
57
53
0
10 Apr 2020
Telling BERT's full story: from Local Attention to Global Aggregation
Damian Pascual
Gino Brunner
Roger Wattenhofer
57
19
0
10 Apr 2020
Injecting Numerical Reasoning Skills into Language Models
Mor Geva
Ankit Gupta
Jonathan Berant
AIMat
LRM
93
227
0
09 Apr 2020
Generating Counter Narratives against Online Hate Speech: Data and Strategies
Serra Sinem Tekiroğlu
Yi-Ling Chung
Marco Guerini
59
112
0
08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
91
323
0
08 Apr 2020
Analyzing Redundancy in Pretrained Transformer Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
Yonatan Belinkov
37
2
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
71
143
0
08 Apr 2020
DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement
Tianda Li
Jia-Chen Gu
Xiao-Dan Zhu
Quan Liu
Zhenhua Ling
Zhiming Su
Si Wei
70
28
0
08 Apr 2020
Towards Evaluating the Robustness of Chinese BERT Classifiers
Wei Ping
Boyuan Pan
Xin Li
Yue Liu
AAML
77
8
0
07 Apr 2020
Byte Pair Encoding is Suboptimal for Language Model Pretraining
Kaj Bostrom
Greg Durrett
69
214
0
07 Apr 2020
Transformers to Learn Hierarchical Contexts in Multiparty Dialogue for Span-based Question Answering
Changmao Li
Jinho Choi
51
26
0
07 Apr 2020
A Few Topical Tweets are Enough for Effective User-Level Stance Detection
Younes Samih
Kareem Darwish
29
7
0
07 Apr 2020
Deep Learning Based Text Classification: A Comprehensive Review
Shervin Minaee
Nal Kalchbrenner
Min Zhang
Narjes Nikzad
M. Asgari-Chenaghlu
Jianfeng Gao
AILaw
VLM
AI4TS
116
1,115
0
06 Apr 2020
Continual Domain-Tuning for Pretrained Language Models
Subendhu Rongali
Abhyuday N. Jagannatha
Bhanu Pratap Singh Rawat
Hong-ye Yu
CLL
KELM
50
7
0
05 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
97
361
0
05 Apr 2020
Finding Black Cat in a Coal Cellar -- Keyphrase Extraction & Keyphrase-Rubric Relationship Classification from Complex Assignments
Manikandan Ravikiran
13
0
0
03 Apr 2020
Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish
46
4
0
02 Apr 2020
Deep Entity Matching with Pre-Trained Language Models
Yuliang Li
Jinfeng Li
Yoshihiko Suhara
A. Doan
W. Tan
VLM
108
391
0
01 Apr 2020
Information Leakage in Embedding Models
Congzheng Song
A. Raghunathan
MIACV
92
274
0
31 Mar 2020
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
AI4CE
98
24
0
29 Mar 2020
Felix: Flexible Text Editing Through Tagging and Insertion
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
Guillermo Garrido
82
76
0
24 Mar 2020
Data-driven models and computational tools for neurolinguistics: a language technology perspective
Ekaterina Artemova
Amir Bakarov
A. Artemov
Evgeny Burnaev
M. Sharaev
46
4
0
23 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
390
1,498
0
18 Mar 2020
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
344
302
0
17 Mar 2020
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
276
151
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
40
31
0
16 Mar 2020
A Survey of End-to-End Driving: Architectures and Training Methods
Ardi Tampuu
Maksym Semikin
Naveed Muhammad
D. Fishman
Tambet Matiisen
3DV
108
238
0
13 Mar 2020
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
85
112
0
13 Mar 2020
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity
Ivan Vulić
Simon Baker
Edoardo Ponti
Ulla Petti
Ira Leviant
...
Eden Bar
Matt Malone
Thierry Poibeau
Roi Reichart
Anna Korhonen
90
83
0
10 Mar 2020
A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Viktor Schlegel
Marco Valentino
André Freitas
Goran Nenadic
Riza Batista-Navarro
58
30
0
10 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
162
108
0
05 Mar 2020
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
145
80
0
05 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
36
94
0
04 Mar 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
162
975
0
28 Feb 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
88
397
0
28 Feb 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
75
48
0
28 Feb 2020
On Biased Compression for Distributed Learning
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
78
189
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
137
1,511
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
134
201
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
138
151
0
26 Feb 2020
Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension
H. Wan
122
13
0
26 Feb 2020
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
VLM
KELM
102
13
0
25 Feb 2020
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
Eric Hulburd
53
5
0
25 Feb 2020
Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
Yixuan Tang
Hwee Tou Ng
A. Tung
47
34
0
23 Feb 2020
Previous
1
2
3
...
56
57
58
59
Next