ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2004.02984
  4. Cited By
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices

6 April 2020
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
    MQ
ArXivPDFHTML

Papers citing "MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices"

50 / 176 papers shown
Title
VIRT: Improving Representation-based Models for Text Matching through
  Virtual Interaction
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
Dan Li
Yang Yang
Hongyin Tang
Jingang Wang
Tong Xu
Wei Wu
Enhong Chen
27
7
0
08 Dec 2021
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient
  Transformer Inference
NN-LUT: Neural Approximation of Non-Linear Operations for Efficient Transformer Inference
Joonsang Yu
Junki Park
Seongmin Park
Minsoo Kim
Sihwa Lee
Dong Hyun Lee
Jungwook Choi
37
50
0
03 Dec 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
22
0
0
22 Nov 2021
Character-level HyperNetworks for Hate Speech Detection
Character-level HyperNetworks for Hate Speech Detection
Tomer Wullach
A. Adler
Einat Minkov
24
12
0
11 Nov 2021
Prune Once for All: Sparse Pre-Trained Language Models
Prune Once for All: Sparse Pre-Trained Language Models
Ofir Zafrir
Ariel Larey
Guy Boudoukh
Haihao Shen
Moshe Wasserblat
VLM
34
82
0
10 Nov 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token
  Pruning
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul Chilimbi
21
18
0
30 Oct 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language
  Understanding via ADMM
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
37
18
0
28 Oct 2021
Vis-TOP: Visual Transformer Overlay Processor
Vis-TOP: Visual Transformer Overlay Processor
Wei Hu
Dian Xu
Zimeng Fan
Fang Liu
Yanxiang He
BDL
ViT
25
5
0
21 Oct 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
41
5
0
16 Oct 2021
Kronecker Decomposition for GPT Compression
Kronecker Decomposition for GPT Compression
Ali Edalati
Marzieh S. Tahaei
Ahmad Rashid
V. Nia
J. Clark
Mehdi Rezagholizadeh
36
33
0
15 Oct 2021
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
SPoT: Better Frozen Model Adaptation through Soft Prompt Transfer
Tu Vu
Brian Lester
Noah Constant
Rami Al-Rfou
Daniel Cer
VLM
LRM
137
278
0
15 Oct 2021
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
28
46
0
13 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
29
118
0
05 Oct 2021
FQuAD2.0: French Question Answering and knowing that you know nothing
FQuAD2.0: French Question Answering and knowing that you know nothing
Quentin Heinrich
Gautier Viaud
Wacim Belblidia
11
8
0
27 Sep 2021
Understanding and Overcoming the Challenges of Efficient Transformer
  Quantization
Understanding and Overcoming the Challenges of Efficient Transformer Quantization
Yelysei Bondarenko
Markus Nagel
Tijmen Blankevoort
MQ
25
133
0
27 Sep 2021
Improving Question Answering Performance Using Knowledge Distillation
  and Active Learning
Improving Question Answering Performance Using Knowledge Distillation and Active Learning
Yasaman Boreshban
Seyed Morteza Mirbostani
Gholamreza Ghassem-Sani
Seyed Abolghasem Mirroshandel
Shahin Amiriparian
34
15
0
26 Sep 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
31
22
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language
  Understanding
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
19
1
0
21 Sep 2021
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up
  Knowledge Distillation
EfficientBERT: Progressively Searching Multilayer Perceptron via Warm-up Knowledge Distillation
Chenhe Dong
Guangrun Wang
Hang Xu
Jiefeng Peng
Xiaozhe Ren
Xiaodan Liang
24
28
0
15 Sep 2021
Will this Question be Answered? Question Filtering via Answer Model
  Distillation for Efficient Question Answering
Will this Question be Answered? Question Filtering via Answer Model Distillation for Efficient Question Answering
Siddhant Garg
Alessandro Moschitti
31
26
0
14 Sep 2021
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language
  Models via Knowledge Distillation
KroneckerBERT: Learning Kronecker Decomposition for Pre-trained Language Models via Knowledge Distillation
Marzieh S. Tahaei
Ella Charlaix
V. Nia
A. Ghodsi
Mehdi Rezagholizadeh
46
22
0
13 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the
  Robustness of Knowledge Distillation in Natural Language Understanding
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
38
5
0
13 Sep 2021
Compute and Energy Consumption Trends in Deep Learning Inference
Compute and Energy Consumption Trends in Deep Learning Inference
Radosvet Desislavov
Fernando Martínez-Plumed
José Hernández-Orallo
35
113
0
12 Sep 2021
Block Pruning For Faster Transformers
Block Pruning For Faster Transformers
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
33
219
0
10 Sep 2021
Learning to Teach with Student Feedback
Learning to Teach with Student Feedback
Yitao Liu
Tianxiang Sun
Xipeng Qiu
Xuanjing Huang
VLM
23
6
0
10 Sep 2021
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text
  Recognition
PIMNet: A Parallel, Iterative and Mimicking Network for Scene Text Recognition
Zhi Qiao
Yu Zhou
Jin Wei
Wei Wang
Yuanqing Zhang
Ning Jiang
Hongbin Wang
Weiping Wang
31
70
0
09 Sep 2021
What's Hidden in a One-layer Randomly Weighted Transformer?
What's Hidden in a One-layer Randomly Weighted Transformer?
Sheng Shen
Z. Yao
Douwe Kiela
Kurt Keutzer
Michael W. Mahoney
34
4
0
08 Sep 2021
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT
  Compression
Beyond Preserved Accuracy: Evaluating Loyalty and Robustness of BERT Compression
Canwen Xu
Wangchunshu Zhou
Tao Ge
Kelvin J. Xu
Julian McAuley
Furu Wei
21
41
0
07 Sep 2021
Sequential Attention Module for Natural Language Processing
Sequential Attention Module for Natural Language Processing
Mengyuan Zhou
Jian Ma
Haiqing Yang
Lian-Xin Jiang
Yang Mo
AI4TS
27
2
0
07 Sep 2021
DKM: Differentiable K-Means Clustering Layer for Neural Network
  Compression
DKM: Differentiable K-Means Clustering Layer for Neural Network Compression
Minsik Cho
Keivan Alizadeh Vahid
Saurabh N. Adya
Mohammad Rastegari
42
34
0
28 Aug 2021
Greenformers: Improving Computation and Memory Efficiency in Transformer
  Models via Low-Rank Approximation
Greenformers: Improving Computation and Memory Efficiency in Transformer Models via Low-Rank Approximation
Samuel Cahyawijaya
31
12
0
24 Aug 2021
AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning
AutoFL: Enabling Heterogeneity-Aware Energy Efficient Federated Learning
Young Geun Kim
Carole-Jean Wu
26
85
0
16 Jul 2021
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
FLAT: An Optimized Dataflow for Mitigating Attention Bottlenecks
Sheng-Chun Kao
Suvinay Subramanian
Gaurav Agrawal
Amir Yazdanbakhsh
T. Krishna
51
58
0
13 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
17
146
0
02 Jul 2021
Open, Sesame! Introducing Access Control to Voice Services
Open, Sesame! Introducing Access Control to Voice Services
Dominika Woszczyk
Alvin Lee
Soteris Demetriou
AAML
24
0
0
27 Jun 2021
Teacher's pet: understanding and mitigating biases in distillation
Teacher's pet: understanding and mitigating biases in distillation
Michal Lukasik
Srinadh Bhojanapalli
A. Menon
Sanjiv Kumar
18
25
0
19 Jun 2021
Efficient Deep Learning: A Survey on Making Deep Learning Models
  Smaller, Faster, and Better
Efficient Deep Learning: A Survey on Making Deep Learning Models Smaller, Faster, and Better
Gaurav Menghani
VLM
MedIm
23
367
0
16 Jun 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
19
22
0
08 Jun 2021
How Good Is NLP? A Sober Look at NLP Tasks through the Lens of Social
  Impact
How Good Is NLP? A Sober Look at NLP Tasks through the Lens of Social Impact
Zhijing Jin
Geeticka Chauhan
Brian Tse
Mrinmaya Sachan
Rada Mihalcea
30
25
0
04 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained
  Transformer Compression
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
34
13
0
04 Jun 2021
A Compression-Compilation Framework for On-mobile Real-time BERT
  Applications
A Compression-Compilation Framework for On-mobile Real-time BERT Applications
Wei Niu
Zhenglun Kong
Geng Yuan
Weiwen Jiang
Jiexiong Guan
Caiwen Ding
Pu Zhao
Sijia Liu
Bin Ren
Yanzhi Wang
MQ
25
4
0
30 May 2021
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
TR-BERT: Dynamic Token Reduction for Accelerating BERT Inference
Deming Ye
Yankai Lin
Yufei Huang
Maosong Sun
MQ
27
63
0
25 May 2021
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with
  Adapters
Retrieval-Free Knowledge-Grounded Dialogue Response Generation with Adapters
Yan Xu
Etsuko Ishii
Samuel Cahyawijaya
Zihan Liu
Genta Indra Winata
Andrea Madotto
Dan Su
Pascale Fung
RALM
25
44
0
13 May 2021
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
MATE-KD: Masked Adversarial TExt, a Companion to Knowledge Distillation
Ahmad Rashid
Vasileios Lioutas
Mehdi Rezagholizadeh
AAML
23
36
0
12 May 2021
Disfluency Detection with Unlabeled Data and Small BERT Models
Disfluency Detection with Unlabeled Data and Small BERT Models
Johann C. Rocholl
Vicky Zayats
D. D. Walker
Noah B. Murad
Aaron Schneider
Daniel J. Liebling
52
27
0
21 Apr 2021
Compressing Visual-linguistic Model via Knowledge Distillation
Compressing Visual-linguistic Model via Knowledge Distillation
Zhiyuan Fang
Jianfeng Wang
Xiaowei Hu
Lijuan Wang
Yezhou Yang
Zicheng Liu
VLM
39
97
0
05 Apr 2021
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Keyword Transformer: A Self-Attention Model for Keyword Spotting
Axel Berg
Mark O'Connor
M. T. Cruz
27
133
0
01 Apr 2021
Finetuning Pretrained Transformers into RNNs
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
46
63
0
24 Mar 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
  Architectures
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
94
0
23 Mar 2021
ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques
ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques
Yuanxin Liu
Zheng Lin
Fengcheng Yuan
VLM
MQ
10
18
0
21 Mar 2021
Previous
1234
Next