Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1909.00100
Cited By
Small and Practical BERT Models for Sequence Labeling
31 August 2019
Henry Tsai
Jason Riesa
Melvin Johnson
N. Arivazhagan
Xin Li
Amelia Archer
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Small and Practical BERT Models for Sequence Labeling"
16 / 66 papers shown
Title
GoEmotions: A Dataset of Fine-Grained Emotions
Dorottya Demszky
Dana Movshovitz-Attias
Jeongwoo Ko
Alan S. Cowen
Gaurav Nemade
Sujith Ravi
AI4MH
97
724
0
01 May 2020
Enriched Pre-trained Transformers for Joint Slot Filling and Intent Detection
Momchil Hardalov
Ivan Koychev
Preslav Nakov
VLM
47
17
0
30 Apr 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
Subhabrata Mukherjee
Ahmed Hassan Awadallah
80
59
0
12 Apr 2020
Analyzing Redundancy in Pretrained Transformer Models
Fahim Dalvi
Hassan Sajjad
Nadir Durrani
Yonatan Belinkov
34
2
0
08 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Xinyu Wang
Yong Jiang
Nguyen Bach
Tao Wang
Fei Huang
Kewei Tu
96
36
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
71
143
0
08 Apr 2020
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
Bowen Wu
Huan Zhang
Mengyuan Li
Zongsheng Wang
Qihang Feng
Junhong Huang
Baoxun Wang
34
4
0
07 Apr 2020
Multi-Step Inference for Reasoning Over Paragraphs
Jiangming Liu
Matt Gardner
Shay B. Cohen
Mirella Lapata
ReLM
LRM
49
18
0
06 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
132
820
0
06 Apr 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
388
1,498
0
18 Mar 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
137
1,510
0
27 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
227
1,285
0
25 Feb 2020
Structured Pruning of a BERT-based Question Answering Model
J. Scott McCarley
Rishav Chakravarti
Avirup Sil
94
53
0
14 Oct 2019
Training Compact Models for Low Resource Entity Tagging using Pre-trained Language Models
Peter Izsak
Shira Guskin
Moshe Wasserblat
VLM
59
8
0
14 Oct 2019
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
92
293
0
10 Oct 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
303
7,575
0
02 Oct 2019
Previous
1
2