Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.09355
Cited By
Patient Knowledge Distillation for BERT Model Compression
25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Patient Knowledge Distillation for BERT Model Compression"
50 / 492 papers shown
Title
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
21
122
0
27 Dec 2020
Learning Light-Weight Translation Models from Deep Transformer
Bei Li
Ziyang Wang
Hui Liu
Quan Du
Tong Xiao
Chunliang Zhang
Jingbo Zhu
VLM
120
40
0
27 Dec 2020
A Survey on Visual Transformer
Kai Han
Yunhe Wang
Hanting Chen
Xinghao Chen
Jianyuan Guo
...
Chunjing Xu
Yixing Xu
Zhaohui Yang
Yiman Zhang
Dacheng Tao
ViT
23
2,132
0
23 Dec 2020
Undivided Attention: Are Intermediate Layers Necessary for BERT?
S. N. Sridhar
Anthony Sarah
24
14
0
22 Dec 2020
Wasserstein Contrastive Representation Distillation
Liqun Chen
Dong Wang
Zhe Gan
Jingjing Liu
Ricardo Henao
Lawrence Carin
20
93
0
15 Dec 2020
Parameter-Efficient Transfer Learning with Diff Pruning
Demi Guo
Alexander M. Rush
Yoon Kim
13
385
0
14 Dec 2020
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
52
57
0
14 Dec 2020
Reinforced Multi-Teacher Selection for Knowledge Distillation
Fei Yuan
Linjun Shou
J. Pei
Wutao Lin
Ming Gong
Yan Fu
Daxin Jiang
15
121
0
11 Dec 2020
Improving Task-Agnostic BERT Distillation with Layer Mapping Search
Xiaoqi Jiao
Huating Chang
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
Fang Wang
Qun Liu
29
12
0
11 Dec 2020
Meta-KD: A Meta Knowledge Distillation Framework for Language Model Compression across Domains
Haojie Pan
Chengyu Wang
Minghui Qiu
Yichang Zhang
Yaliang Li
Jun Huang
23
49
0
02 Dec 2020
EasyTransfer -- A Simple and Scalable Deep Transfer Learning Platform for NLP Applications
Minghui Qiu
Peng Li
Chengyu Wang
Hanjie Pan
Yaliang Li
...
Jun Yang
Yaliang Li
Jun Huang
Deng Cai
Wei Lin
VLM
SyDa
39
20
0
18 Nov 2020
Know What You Don't Need: Single-Shot Meta-Pruning for Attention Heads
Zhengyan Zhang
Fanchao Qi
Zhiyuan Liu
Qun Liu
Maosong Sun
VLM
41
30
0
07 Nov 2020
Sound Natural: Content Rephrasing in Dialog Systems
Arash Einolghozati
Anchit Gupta
K. Diedrick
S. Gupta
23
6
0
03 Nov 2020
MixKD: Towards Efficient Distillation of Large-scale Language Models
Kevin J Liang
Weituo Hao
Dinghan Shen
Yufan Zhou
Weizhu Chen
Changyou Chen
Lawrence Carin
13
73
0
01 Nov 2020
Improved Synthetic Training for Reading Comprehension
Yanda Chen
Md Arafat Sultan
T. J. W. R. Center
SyDa
29
5
0
24 Oct 2020
Optimal Subarchitecture Extraction For BERT
Adrian de Wynter
Daniel J. Perry
MQ
51
18
0
20 Oct 2020
BERT2DNN: BERT Distillation with Massive Unlabeled Data for Online E-Commerce Search
Yunjiang Jiang
Yue Shang
Ziyang Liu
Hongwei Shen
Yun Xiao
Wei Xiong
Sulong Xu
Weipeng P. Yan
Di Jin
29
17
0
20 Oct 2020
AutoADR: Automatic Model Design for Ad Relevance
Yiren Chen
Yaming Yang
Hong Sun
Yujing Wang
Yu Xu
Wei Shen
Rong Zhou
Yunhai Tong
Jing Bai
Ruofei Zhang
42
3
0
14 Oct 2020
Length-Adaptive Transformer: Train Once with Length Drop, Use Anytime with Search
Gyuwan Kim
Kyunghyun Cho
37
94
0
14 Oct 2020
Weight Squeezing: Reparameterization for Knowledge Transfer and Model Compression
Artem Chumachenko
Daniil Gavrilov
Nikita Balagansky
Pavel Kalaidin
13
0
0
14 Oct 2020
Pretrained Transformers for Text Ranking: BERT and Beyond
Jimmy J. Lin
Rodrigo Nogueira
Andrew Yates
VLM
244
612
0
13 Oct 2020
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
17
54
0
13 Oct 2020
Load What You Need: Smaller Versions of Multilingual BERT
Amine Abdaoui
Camille Pradel
Grégoire Sigel
47
72
0
12 Oct 2020
Adversarial Self-Supervised Data-Free Distillation for Text Classification
Xinyin Ma
Yongliang Shen
Gongfan Fang
Chen Chen
Chenghao Jia
Weiming Lu
30
24
0
10 Oct 2020
Deep Learning Meets Projective Clustering
Alaa Maalouf
Harry Lang
Daniela Rus
Dan Feldman
24
9
0
08 Oct 2020
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
17
34
0
06 Oct 2020
Regularizing Dialogue Generation by Imitating Implicit Scenarios
Shaoxiong Feng
Xuancheng Ren
Hongshen Chen
Bin Sun
Kan Li
Xu Sun
18
20
0
05 Oct 2020
Pruning Redundant Mappings in Transformer Models via Spectral-Normalized Identity Prior
Zi Lin
Jeremiah Zhe Liu
Ziao Yang
Nan Hua
Dan Roth
30
46
0
05 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
26
50
0
02 Oct 2020
Pea-KD: Parameter-efficient and Accurate Knowledge Distillation on BERT
Ikhyun Cho
U. Kang
4
1
0
30 Sep 2020
Contrastive Distillation on Intermediate Representations for Language Model Compression
S. Sun
Zhe Gan
Yu Cheng
Yuwei Fang
Shuohang Wang
Jingjing Liu
VLM
28
69
0
29 Sep 2020
TernaryBERT: Distillation-aware Ultra-low Bit BERT
Wei Zhang
Lu Hou
Yichun Yin
Lifeng Shang
Xiao Chen
Xin Jiang
Qun Liu
MQ
33
208
0
27 Sep 2020
Efficient Transformer-based Large Scale Language Representations using Hardware-friendly Block Structured Pruning
Bingbing Li
Zhenglun Kong
Tianyun Zhang
Ji Li
ZeLin Li
Hang Liu
Caiwen Ding
VLM
32
64
0
17 Sep 2020
Simplified TinyBERT: Knowledge Distillation for Document Retrieval
Xuanang Chen
Xianpei Han
Kai Hui
Le Sun
Yingfei Sun
17
25
0
16 Sep 2020
Mimic and Conquer: Heterogeneous Tree Structure Distillation for Syntactic NLP
Hao Fei
Yafeng Ren
Donghong Ji
30
24
0
16 Sep 2020
DualDE: Dually Distilling Knowledge Graph Embedding for Faster and Cheaper Reasoning
Yushan Zhu
Wen Zhang
Mingyang Chen
Hui Chen
Xu-Xin Cheng
Wei Zhang
Huajun Chen Zhejiang University
22
15
0
13 Sep 2020
Compressed Deep Networks: Goodbye SVD, Hello Robust Low-Rank Approximation
M. Tukan
Alaa Maalouf
Matan Weksler
Dan Feldman
23
9
0
11 Sep 2020
Accelerating Real-Time Question Answering via Question Generation
Yuwei Fang
Shuohang Wang
Zhe Gan
S. Sun
Jingjing Liu
Chenguang Zhu
OnRL
15
16
0
10 Sep 2020
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
22
115
0
12 Aug 2020
Understanding BERT Rankers Under Distillation
Luyu Gao
Zhuyun Dai
Jamie Callan
14
49
0
21 Jul 2020
Knowledge Distillation in Deep Learning and its Applications
Abdolmaged Alkhulaifi
Fahad Alsahli
Irfan Ahmad
FedML
28
76
0
17 Jul 2020
Extracurricular Learning: Knowledge Transfer Beyond Empirical Distribution
Hadi Pouransari
Mojan Javaheripi
Vinay Sharma
Oncel Tuzel
14
5
0
30 Jun 2020
SqueezeBERT: What can computer vision teach NLP about efficient neural networks?
F. Iandola
Albert Eaton Shaw
Ravi Krishna
Kurt Keutzer
VLM
28
127
0
19 Jun 2020
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,851
0
09 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
8
331
0
07 Jun 2020
An Overview of Neural Network Compression
James OÑeill
AI4CE
45
98
0
05 Jun 2020
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
33
57
0
31 May 2020
Distilling Knowledge from Ensembles of Acoustic Models for Joint CTC-Attention End-to-End Speech Recognition
Yan Gao
Titouan Parcollet
Nicholas D. Lane
VLM
17
13
0
19 May 2020
Distilling Knowledge from Pre-trained Language Models via Text Smoothing
Xing Wu
Y. Liu
Xiangyang Zhou
Dianhai Yu
20
6
0
08 May 2020
GOBO: Quantizing Attention-Based NLP Models for Low Latency and Energy Efficient Inference
Ali Hadi Zadeh
Isak Edo
Omar Mohamed Awad
Andreas Moshovos
MQ
30
183
0
08 May 2020
Previous
1
2
3
...
10
8
9
Next