Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.11942
Cited By
v1
v2
v3
v4
v5
v6 (latest)
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
26 September 2019
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
Re-assign community
ArXiv (abs)
PDF
HTML
Github (3271★)
Papers citing
"ALBERT: A Lite BERT for Self-supervised Learning of Language Representations"
50 / 2,935 papers shown
Title
HyperGrid: Efficient Multi-Task Transformers with Grid-wise Decomposable Hyper Projections
Yi Tay
Zhe Zhao
Dara Bahri
Donald Metzler
Da-Cheng Juan
66
9
0
12 Jul 2020
Deep or Simple Models for Semantic Tagging? It Depends on your Data [Experiments]
Jinfeng Li
Yuliang Li
Xiaolan Wang
W. Tan
VLM
36
9
0
11 Jul 2020
Alleviating the Burden of Labeling: Sentence Generation by Attention Branch Encoder-Decoder Network
Tadashi Ogura
A. Magassouba
K. Sugiura
Tsubasa Hirakawa
Takayoshi Yamashita
H. Fujiyoshi
Hisashi Kawai
48
11
0
09 Jul 2020
IQ-VQA: Intelligent Visual Question Answering
Vatsal Goel
Mohit Chandak
A. Anand
Prithwijit Guha
64
5
0
08 Jul 2020
Remix: Rebalanced Mixup
Hsin-Ping Chou
Shih-Chieh Chang
Jia-Yu Pan
Wei Wei
Da-Cheng Juan
112
237
0
08 Jul 2020
Pre-Trained Models for Heterogeneous Information Networks
Yang Fang
Xiang Zhao
Yifan Chen
W. Xiao
Maarten de Rijke
SSL
47
1
0
07 Jul 2020
Efficient Conformal Prediction via Cascaded Inference with Expanded Admission
Adam Fisch
Tal Schuster
Tommi Jaakkola
Regina Barzilay
50
1
0
06 Jul 2020
LMVE at SemEval-2020 Task 4: Commonsense Validation and Explanation using Pretraining Language Model
Shilei Liu
Yu Guo
Bochao Li
Feiliang Ren
LRM
81
4
0
06 Jul 2020
Reading Comprehension in Czech via Machine Translation and Cross-lingual Transfer
K. Macková
Milan Straka
55
13
0
03 Jul 2020
SemEval-2020 Task 4: Commonsense Validation and Explanation
Cunxiang Wang
Shuailong Liang
Yili Jin
Yilong Wang
Xiao-Dan Zhu
Yue Zhang
LRM
141
99
0
01 Jul 2020
Transferability of Natural Language Inference to Biomedical Question Answering
Minbyul Jeong
Mujeen Sung
Gangwoo Kim
Donghyeon Kim
Wonjin Yoon
J. Yoo
Jaewoo Kang
80
40
0
01 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
142
135
0
30 Jun 2020
SE3M: A Model for Software Effort Estimation Using Pre-trained Embedding Models
E. M. D. B. Fávero
Dalcimar Casanova
Andrey R. Pimentel
34
12
0
30 Jun 2020
Multi-Head Attention: Collaborate Instead of Concatenate
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
82
115
0
29 Jun 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
218
1,798
0
29 Jun 2020
BOND: BERT-Assisted Open-Domain Named Entity Recognition with Distant Supervision
Chen Liang
Yue Yu
Haoming Jiang
Siawpeng Er
Ruijia Wang
T. Zhao
Chao Zhang
OffRL
78
240
0
28 Jun 2020
Video-Grounded Dialogues with Pretrained Generation Language Models
Hung Le
Guosheng Lin
82
28
0
27 Jun 2020
BERTology Meets Biology: Interpreting Attention in Protein Language Models
Jesse Vig
Ali Madani
Lav Varshney
Caiming Xiong
R. Socher
Nazneen Rajani
110
295
0
26 Jun 2020
Train and You'll Miss It: Interactive Model Iteration with Weak Supervision and Pre-Trained Embeddings
Mayee F. Chen
Daniel Y. Fu
Frederic Sala
Sen Wu
Ravi Teja Mullapudi
Fait Poms
Kayvon Fatahalian
Christopher Ré
61
10
0
26 Jun 2020
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
137
46
0
22 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
109
2
0
21 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
90
128
0
19 Jun 2020
New Vietnamese Corpus for Machine Reading Comprehension of Health News Articles
Kiet Van Nguyen
Tin Van Huynh
Duc-Vu Nguyen
A. Nguyen
Ngan Luu-Thuy Nguyen
75
41
0
19 Jun 2020
Neural Parameter Allocation Search
Bryan A. Plummer
Nikoli Dryden
Julius Frost
Torsten Hoefler
Kate Saenko
120
16
0
18 Jun 2020
Self-supervised Learning for Speech Enhancement
Yuchun Wang
Shrikant Venkataramani
Paris Smaragdis
SSL
96
31
0
18 Jun 2020
I-BERT: Inductive Generalization of Transformer to Arbitrary Context Lengths
Hyoungwook Nam
S. Seo
Vikram Sharma Malithody
Noor Michael
Lang Li
26
1
0
18 Jun 2020
PERL: Pivot-based Domain Adaptation for Pre-trained Deep Contextualized Embedding Models
Eyal Ben-David
Carmel Rabinovitz
Roi Reichart
SSL
118
63
0
16 Jun 2020
Self-supervised Learning: Generative or Contrastive
Xiao Liu
Fanjin Zhang
Zhenyu Hou
Zhaoyu Wang
Li Mian
Jing Zhang
Jie Tang
SSL
211
1,645
0
15 Jun 2020
How to Avoid Being Eaten by a Grue: Structured Exploration Strategies for Textual Worlds
Prithviraj Ammanabrolu
Ethan Tien
Matthew J. Hausknecht
Mark O. Riedl
LLMAG
83
50
0
12 Jun 2020
SemEval-2020 Task 12: Multilingual Offensive Language Identification in Social Media (OffensEval 2020)
Marcos Zampieri
Preslav Nakov
Sara Rosenthal
Pepa Atanasova
Georgi Karadzhov
Hamdy Mubarak
Leon Derczynski
Zeses Pitenis
cCaugri cColtekin
81
488
0
12 Jun 2020
A Practical Sparse Approximation for Real Time Recurrent Learning
Jacob Menick
Erich Elsen
Utku Evci
Simon Osindero
Karen Simonyan
Alex Graves
92
32
0
12 Jun 2020
NanoFlow: Scalable Normalizing Flows with Sublinear Parameter Complexity
Sang-gil Lee
Sungwon Kim
Sungroh Yoon
77
17
0
11 Jun 2020
A Monolingual Approach to Contextualized Word Embeddings for Mid-Resource Languages
Pedro Ortiz Suarez
Laurent Romary
Benoît Sagot
73
234
0
11 Jun 2020
Revisiting Few-sample BERT Fine-tuning
Tianyi Zhang
Felix Wu
Arzoo Katiyar
Kilian Q. Weinberger
Yoav Artzi
180
446
0
10 Jun 2020
MC-BERT: Efficient Language Pre-Training via a Meta Controller
Zhenhui Xu
Linyuan Gong
Guolin Ke
Di He
Shuxin Zheng
Liwei Wang
Jiang Bian
Tie-Yan Liu
BDL
65
18
0
10 Jun 2020
On the Stability of Fine-tuning BERT: Misconceptions, Explanations, and Strong Baselines
Marius Mosbach
Maksym Andriushchenko
Dietrich Klakow
187
363
0
08 Jun 2020
Pre-training Polish Transformer-based Language Models at Scale
Slawomir Dadas
Michal Perelkiewicz
Rafal Poswiata
98
39
0
07 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
79
343
0
07 Jun 2020
An Overview of Neural Network Compression
James OÑeill
AI4CE
160
99
0
05 Jun 2020
DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations
John Giorgi
Osvald Nitski
Bo Wang
Gary D. Bader
SSL
151
499
0
05 Jun 2020
DeBERTa: Decoding-enhanced BERT with Disentangled Attention
Pengcheng He
Xiaodong Liu
Jianfeng Gao
Weizhu Chen
AAML
187
2,770
0
05 Jun 2020
GMAT: Global Memory Augmentation for Transformers
Ankit Gupta
Jonathan Berant
RALM
81
50
0
05 Jun 2020
Understanding Self-Attention of Self-Supervised Audio Transformers
Shu-Wen Yang
Andy T. Liu
Hung-yi Lee
55
27
0
05 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
109
236
0
05 Jun 2020
Position Masking for Language Models
Andy Wagner
T. Mitra
Mrinal Iyer
Godfrey Da Costa
Marc Tremblay
22
5
0
02 Jun 2020
Subjective Question Answering: Deciphering the inner workings of Transformers in the realm of subjectivity
Lukas Muttenthaler
41
3
0
02 Jun 2020
WikiBERT models: deep transfer learning for many languages
S. Pyysalo
Jenna Kanerva
Antti Virtanen
Filip Ginter
KELM
89
38
0
02 Jun 2020
Question Answering on Scholarly Knowledge Graphs
M. Y. Jaradeh
M. Stocker
Sören Auer
LMTD
RALM
41
13
0
02 Jun 2020
Careful analysis of XRD patterns with Attention
Koichi Kano
T. Segi
H. Ozono
27
0
0
02 Jun 2020
A Pairwise Probe for Understanding BERT Fine-Tuning on Machine Reading Comprehension
Jie Cai
Zhengzhou Zhu
Ping Nie
Qian Liu
AAML
26
7
0
02 Jun 2020
Previous
1
2
3
...
53
54
55
...
57
58
59
Next