Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.02984
Cited By
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
6 April 2020
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices"
26 / 176 papers shown
Title
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
54
373
0
05 Mar 2021
Improving Computational Efficiency in Visual Reinforcement Learning via Stored Embeddings
Lili Chen
Kimin Lee
A. Srinivas
Pieter Abbeel
OffRL
24
11
0
04 Mar 2021
SEED: Self-supervised Distillation For Visual Representation
Zhiyuan Fang
Jianfeng Wang
Lijuan Wang
Lei Zhang
Yezhou Yang
Zicheng Liu
SSL
247
190
0
12 Jan 2021
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
107
345
0
05 Jan 2021
Reservoir Transformers
Sheng Shen
Alexei Baevski
Ari S. Morcos
Kurt Keutzer
Michael Auli
Douwe Kiela
35
17
0
30 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
21
122
0
27 Dec 2020
LiteMuL: A Lightweight On-Device Sequence Tagger using Multi-task Learning
S. Kumari
Vibhav Agarwal
B. Challa
Kranti Chalamalasetti
Sourav Ghosh
Harshavardhana
Barath Raj Kandur Raja
27
1
0
15 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
20
386
0
14 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
56
58
0
14 Dec 2020
MiniVLM: A Smaller and Faster Vision-Language Model
Jianfeng Wang
Xiaowei Hu
Pengchuan Zhang
Xiujun Li
Lijuan Wang
Lefei Zhang
Jianfeng Gao
Zicheng Liu
VLM
MLLM
35
59
0
13 Dec 2020
Two Stage Transformer Model for COVID-19 Fake News Detection and Fact Checking
Rutvik Vijjali
Prathyush Potluri
S. Kumar
Sundeep Teki
MedIm
31
74
0
26 Nov 2020
Bringing AI To Edge: From Deep Learning's Perspective
Di Liu
Hao Kong
Xiangzhong Luo
Weichen Liu
Ravi Subramaniam
54
116
0
25 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
46
30
0
07 Nov 2020
Federated Knowledge Distillation
Hyowoon Seo
Jihong Park
Seungeun Oh
M. Bennis
Seong-Lyun Kim
FedML
36
91
0
04 Nov 2020
MixKD: Towards Efficient Distillation of Large-scale Language Models
Kevin J. Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
24
73
0
01 Nov 2020
Pre-trained Summarization Distillation
Sam Shleifer
Alexander M. Rush
26
98
0
24 Oct 2020
Rethinking embedding coupling in pre-trained language models
Hyung Won Chung
Thibault Févry
Henry Tsai
Melvin Johnson
Sebastian Ruder
97
142
0
24 Oct 2020
AdapterDrop: On the Efficiency of Adapters in Transformers
Andreas Rucklé
Gregor Geigle
Max Glockner
Tilman Beck
Jonas Pfeiffer
Nils Reimers
Iryna Gurevych
57
255
0
22 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
20
34
0
06 Oct 2020
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
22
115
0
12 Aug 2020
ConvBERT: Improving BERT with Span-based Dynamic Convolution
Zihang Jiang
Weihao Yu
Daquan Zhou
Yunpeng Chen
Jiashi Feng
Shuicheng Yan
48
157
0
06 Aug 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
48
230
0
05 Jun 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
30
185
0
08 May 2020
DeFormer: Decomposing Pre-trained Transformers for Faster Question Answering
Qingqing Cao
H. Trivedi
A. Balasubramanian
Niranjan Balasubramanian
34
66
0
02 May 2020
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
248
1,454
0
18 Mar 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
304
7,005
0
20 Apr 2018
Previous
1
2
3
4