Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.02984
Cited By
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
6 April 2020
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices"
50 / 176 papers shown
Title
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
23
6
0
03 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
37
8
0
01 Feb 2023
Gradient-based Intra-attention Pruning on Pre-trained Language Models
Ziqing Yang
Yiming Cui
Xin Yao
Shijin Wang
VLM
42
8
0
15 Dec 2022
Selective Amnesia: On Efficient, High-Fidelity and Blind Suppression of Backdoor Effects in Trojaned Machine Learning Models
Rui Zhu
Di Tang
Siyuan Tang
Xiaofeng Wang
Haixu Tang
AAML
FedML
37
13
0
09 Dec 2022
SPARTAN: Sparse Hierarchical Memory for Parameter-Efficient Transformers
Ameet Deshpande
Md Arafat Sultan
Anthony Ferritto
Ashwin Kalyan
Karthik Narasimhan
Avirup Sil
MoE
51
1
0
29 Nov 2022
Knowledge distillation for fast and accurate DNA sequence correction
Anastasiya Belyaeva
Joel Shor
Daniel E. Cook
Kishwar Shafin
Daniel Liu
...
Alexey Kolesnikov
Cory Y. McLean
Maria Nattestad
Andrew Carroll
Pi-Chuan Chang
18
1
0
17 Nov 2022
A Survey for Efficient Open Domain Question Answering
Qin Zhang
Shan Chen
Dongkuan Xu
Qingqing Cao
Xiaojun Chen
Trevor Cohn
Meng Fang
33
33
0
15 Nov 2022
FPT: Improving Prompt Tuning Efficiency via Progressive Training
Yufei Huang
Yujia Qin
Huadong Wang
Yichun Yin
Maosong Sun
Zhiyuan Liu
Qun Liu
VLM
LRM
35
6
0
13 Nov 2022
Efficiently Scaling Transformer Inference
Reiner Pope
Sholto Douglas
Aakanksha Chowdhery
Jacob Devlin
James Bradbury
Anselm Levskaya
Jonathan Heek
Kefan Xiao
Shivani Agrawal
J. Dean
48
300
0
09 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
28
5
0
02 Nov 2022
BEBERT: Efficient and Robust Binary Ensemble BERT
Jiayi Tian
Chao Fang
Hong Wang
Zhongfeng Wang
MQ
32
16
0
28 Oct 2022
COST-EFF: Collaborative Optimization of Spatial and Temporal Efficiency with Slenderized Multi-exit Language Models
Bowen Shen
Zheng Lin
Yuanxin Liu
Zhengxiao Liu
Lei Wang
Weiping Wang
VLM
52
4
0
27 Oct 2022
Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation
Ziqi Wang
Yuexin Wu
Frederick Liu
Daogao Liu
Le Hou
Hongkun Yu
Jing Li
Heng Ji
45
5
0
21 Oct 2022
Efficiently Controlling Multiple Risks with Pareto Testing
Bracha Laufer-Goldshtein
Adam Fisch
Regina Barzilay
Tommi Jaakkola
38
16
0
14 Oct 2022
InFi: End-to-End Learning to Filter Input for Resource-Efficiency in Mobile-Centric Inference
Mu Yuan
Lan Zhang
Fengxiang He
Xueting Tong
Miao-Hui Song
Zhengyuan Xu
Xiang-Yang Li
32
2
0
28 Sep 2022
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching
Kunbo Ding
Weijie Liu
Yuejian Fang
Zhe Zhao
Qi Ju
Xuefeng Yang
23
1
0
13 Sep 2022
Activity report analysis with automatic single or multispan answer extraction
R. Choudhary
A. Sridhar
Erik M. Visser
24
1
0
09 Sep 2022
Efficient Methods for Natural Language Processing: A Survey
Marcos Vinícius Treviso
Ji-Ung Lee
Tianchu Ji
Betty van Aken
Qingqing Cao
...
Emma Strubell
Niranjan Balasubramanian
Leon Derczynski
Iryna Gurevych
Roy Schwartz
38
109
0
31 Aug 2022
PANDA: Prompt Transfer Meets Knowledge Distillation for Efficient Model Adaptation
Qihuang Zhong
Liang Ding
Juhua Liu
Bo Du
Dacheng Tao
VLM
CLL
34
41
0
22 Aug 2022
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Ji Xin
Raphael Tang
Zhiying Jiang
Yaoliang Yu
Jimmy J. Lin
20
1
0
31 Jul 2022
Device-Cloud Collaborative Recommendation via Meta Controller
Jiangchao Yao
Feng Wang
Xichen Ding
Shaohu Chen
Bo Han
Jingren Zhou
Hongxia Yang
32
17
0
07 Jul 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Zhengyu Chen
Jingren Zhou
Fei Wu
Haiqing Chen
Hongxia Yang
VLM
27
10
0
29 Jun 2022
All Mistakes Are Not Equal: Comprehensive Hierarchy Aware Multi-label Predictions (CHAMP)
A. Vaswani
Gaurav Aggarwal
Praneeth Netrapalli
N. Hegde
34
4
0
17 Jun 2022
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers
Z. Yao
Reza Yazdani Aminabadi
Minjia Zhang
Xiaoxia Wu
Conglong Li
Yuxiong He
VLM
MQ
73
448
0
04 Jun 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian Sun
Weiming Hu
ViT
73
42
0
28 May 2022
A Fast Attention Network for Joint Intent Detection and Slot Filling on Edge Devices
Liang Huang
Senjie Liang
Feiyang Ye
Nan Gao
57
4
0
16 May 2022
Chemical transformer compression for accelerating both training and inference of molecular modeling
Yi Yu
K. Börjesson
27
0
0
16 May 2022
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
56
21
0
03 May 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
30
38
0
15 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
35
6
0
15 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
28
124
0
14 Apr 2022
Redwood: Using Collision Detection to Grow a Large-Scale Intent Classification Dataset
Stefan Larson
Kevin Leach
32
9
0
12 Apr 2022
Searching for Efficient Neural Architectures for On-Device ML on Edge TPUs
Berkin Akin
Suyog Gupta
Yun Long
Anton Spiridonov
Zhuo Wang
Marie White
Haonan Xu
Ping Zhou
Yanqi Zhou
24
9
0
09 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
18
180
0
01 Apr 2022
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
38
103
0
21 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
Dynamic N:M Fine-grained Structured Sparse Attention Mechanism
Zhaodong Chen
Yuying Quan
Zheng Qu
L. Liu
Yufei Ding
Yuan Xie
36
22
0
28 Feb 2022
Short-answer scoring with ensembles of pretrained language models
Christopher M. Ormerod
39
8
0
23 Feb 2022
ZeroGen: Efficient Zero-shot Learning via Dataset Generation
Jiacheng Ye
Jiahui Gao
Qintong Li
Hang Xu
Jiangtao Feng
Zhiyong Wu
Tao Yu
Lingpeng Kong
SyDa
52
212
0
16 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
30
58
0
15 Feb 2022
Constrained Optimization with Dynamic Bound-scaling for Effective NLPBackdoor Defense
Guangyu Shen
Yingqi Liu
Guanhong Tao
Qiuling Xu
Zhuo Zhang
Shengwei An
Shiqing Ma
Xinming Zhang
AAML
23
34
0
11 Feb 2022
pNLP-Mixer: an Efficient all-MLP Architecture for Language
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
37
29
0
09 Feb 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
33
4
0
29 Jan 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
34
6
0
15 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
47
288
0
14 Jan 2022
Transfer-Tuning: Reusing Auto-Schedules for Efficient Tensor Program Code Generation
Perry Gibson
José Cano
31
12
0
14 Jan 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
39
79
0
23 Dec 2021
Distilling the Knowledge of Romanian BERTs Using Multiple Teachers
Andrei-Marius Avram
Darius Catrina
Dumitru-Clementin Cercel
Mihai Dascualu
Traian Rebedea
Vasile Puaics
Dan Tufics
22
12
0
23 Dec 2021
Pruning Pretrained Encoders with a Multitask Objective
Patrick Xia
Richard Shin
47
0
0
10 Dec 2021
Previous
1
2
3
4
Next