Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.11942
Cited By
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"
50 / 2,913 papers shown
Title
Gestalt: a Stacking Ensemble for SQuAD2.0
Mohamed El-Geish
16
4
0
02 Apr 2020
Deep Entity Matching with Pre-Trained Language Models
Yuliang Li
Jinfeng Li
Yoshihiko Suhara
A. Doan
W. Tan
VLM
22
373
0
01 Apr 2020
Information Leakage in Embedding Models
Congzheng Song
A. Raghunathan
MIACV
21
262
0
31 Mar 2020
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
AI4CE
34
24
0
29 Mar 2020
Felix: Flexible Text Editing Through Tagging and Insertion
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
Guillermo Garrido
26
76
0
24 Mar 2020
Data-driven models and computational tools for neurolinguistics: a language technology perspective
Ekaterina Artemova
Amir Bakarov
A. Artemov
E. Burnaev
M. Sharaev
15
4
0
23 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
243
290
0
17 Mar 2020
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
225
146
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
10
31
0
16 Mar 2020
A Survey of End-to-End Driving: Architectures and Training Methods
Ardi Tampuu
Maksym Semikin
Naveed Muhammad
D. Fishman
Tambet Matiisen
3DV
23
228
0
13 Mar 2020
Learning to Encode Position for Transformer with Continuous Dynamical Model
Xuanqing Liu
Hsiang-Fu Yu
Inderjit Dhillon
Cho-Jui Hsieh
16
107
0
13 Mar 2020
Multi-SimLex: A Large-Scale Evaluation of Multilingual and Cross-Lingual Lexical Semantic Similarity
Ivan Vulić
Simon Baker
E. Ponti
Ulla Petti
Ira Leviant
...
Eden Bar
Matt Malone
Thierry Poibeau
Roi Reichart
Anna Korhonen
21
82
0
10 Mar 2020
A Framework for Evaluation of Machine Reading Comprehension Gold Standards
Viktor Schlegel
Marco Valentino
André Freitas
Goran Nenadic
R. Batista-Navarro
15
31
0
10 Mar 2020
What the [MASK]? Making Sense of Language-Specific BERT Models
Debora Nozza
Federico Bianchi
Dirk Hovy
89
105
0
05 Mar 2020
Talking-Heads Attention
Noam M. Shazeer
Zhenzhong Lan
Youlong Cheng
Nan Ding
L. Hou
101
80
0
05 Mar 2020
jiant: A Software Toolkit for Research on General-Purpose Text Understanding Models
Yada Pruksachatkun
Philip Yeres
Haokun Liu
Jason Phang
Phu Mon Htut
Alex Jinpeng Wang
Ian Tenney
Samuel R. Bowman
SSeg
6
94
0
04 Mar 2020
AraBERT: Transformer-based Model for Arabic Language Understanding
Wissam Antoun
Fady Baly
Hazem M. Hajj
43
939
0
28 Feb 2020
UniLMv2: Pseudo-Masked Language Models for Unified Language Model Pre-Training
Hangbo Bao
Li Dong
Furu Wei
Wenhui Wang
Nan Yang
...
Yu-Chiang Frank Wang
Songhao Piao
Jianfeng Gao
Ming Zhou
H. Hon
AI4CE
24
391
0
28 Feb 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
12
47
0
28 Feb 2020
On Biased Compression for Distributed Learning
Aleksandr Beznosikov
Samuel Horváth
Peter Richtárik
M. Safaryan
8
183
0
27 Feb 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
35
1,458
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
21
197
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
16
148
0
26 Feb 2020
Multi-task Learning with Multi-head Attention for Multi-choice Reading Comprehension
H. Wan
12
13
0
26 Feb 2020
KEML: A Knowledge-Enriched Meta-Learning Framework for Lexical Relation Classification
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
VLM
KELM
25
12
0
25 Feb 2020
Exploring BERT Parameter Efficiency on the Stanford Question Answering Dataset v2.0
Eric Hulburd
6
5
0
25 Feb 2020
Do Multi-Hop Question Answering Systems Know How to Answer the Single-Hop Sub-Questions?
Yixuan Tang
Hwee Tou Ng
A. Tung
6
31
0
23 Feb 2020
Investigating Typed Syntactic Dependencies for Targeted Sentiment Classification Using Graph Attention Neural Network
Xuefeng Bai
Pengbo Liu
Yue Zhang
GNN
36
3
0
22 Feb 2020
Training Question Answering Models From Synthetic Data
Raul Puri
Ryan Spring
M. Patwary
M. Shoeybi
Bryan Catanzaro
ELM
24
159
0
22 Feb 2020
CoLES: Contrastive Learning for Event Sequences with Self-Supervision
Dmitrii Babaev
Ivan Kireev
Nikita Ovsov
Maria Ivanova
Gleb Gusev
Ivan Nazarov
Alexander Tuzhilin
SSL
AI4TS
20
26
0
19 Feb 2020
Convergence of End-to-End Training in Deep Unsupervised Contrastive Learning
Zixin Wen
SSL
21
2
0
17 Feb 2020
SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models
Bin Wang
C.-C. Jay Kuo
6
153
0
16 Feb 2020
Towards Detection of Subjective Bias using Contextualized Word Embeddings
Tanvi Dadu
Kartikey Pant
R. Mamidi
9
22
0
16 Feb 2020
Fine-Tuning Pretrained Language Models: Weight Initializations, Data Orders, and Early Stopping
Jesse Dodge
Gabriel Ilharco
Roy Schwartz
Ali Farhadi
Hannaneh Hajishirzi
Noah A. Smith
27
583
0
15 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
13
50
0
14 Feb 2020
Transformer on a Diet
Chenguang Wang
Zihao Ye
Aston Zhang
Zheng-Wei Zhang
Alex Smola
24
8
0
14 Feb 2020
HULK: An Energy Efficiency Benchmark Platform for Responsible Natural Language Processing
Xiyou Zhou
Zhiyu Zoey Chen
Xiaoyong Jin
Luu Anh Tuan
14
31
0
14 Feb 2020
How Much Knowledge Can You Pack Into the Parameters of a Language Model?
Adam Roberts
Colin Raffel
Noam M. Shazeer
KELM
25
862
0
10 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
223
197
0
07 Feb 2020
Aligning the Pretraining and Finetuning Objectives of Language Models
Nuo Wang Pierse
Jing Lu
AI4CE
25
2
0
05 Feb 2020
Pseudo-Bidirectional Decoding for Local Sequence Transduction
Wangchunshu Zhou
Tao Ge
Ke Xu
9
3
0
31 Jan 2020
Bringing Stories Alive: Generating Interactive Fiction Worlds
Prithviraj Ammanabrolu
W. Cheung
Dan Tu
William Broniec
Mark O. Riedl
12
50
0
28 Jan 2020
Retrospective Reader for Machine Reading Comprehension
Zhuosheng Zhang
Junjie Yang
Hai Zhao
RALM
25
226
0
27 Jan 2020
DUMA: Reading Comprehension with Transposition Thinking
Pengfei Zhu
Hai Zhao
Xiaoguang Li
AI4CE
34
35
0
26 Jan 2020
ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation
Dongling Xiao
Han Zhang
Yukun Li
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
25
126
0
26 Jan 2020
BERT's output layer recognizes all hidden layers? Some Intriguing Phenomena and a simple way to boost BERT
Wei-Tsung Kao
Tsung-Han Wu
Po-Han Chi
Chun-Cheng Hsieh
Hung-yi Lee
SSL
23
5
0
25 Jan 2020
Multi-task self-supervised learning for Robust Speech Recognition
Mirco Ravanelli
Jianyuan Zhong
Santiago Pascual
P. Swietojanski
João Monteiro
J. Trmal
Yoshua Bengio
SSL
189
288
0
25 Jan 2020
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal
Anamitra R. Choudhury
Saurabh ManishRaje
Venkatesan T. Chakaravarthy
Yogish Sabharwal
Ashish Verma
18
18
0
24 Jan 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Previous
1
2
3
...
56
57
58
59
Next