ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09355
  4. Cited By
Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression

25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
ArXivPDFHTML

Papers citing "Patient Knowledge Distillation for BERT Model Compression"

42 / 492 papers shown
Title
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
46
493
0
01 May 2020
LightPAFF: A Two-Stage Distillation Framework for Pre-training and
  Fine-tuning
LightPAFF: A Two-Stage Distillation Framework for Pre-training and Fine-tuning
Kaitao Song
Hao Sun
Xu Tan
Tao Qin
Jianfeng Lu
Hongzhi Liu
Tie-Yan Liu
20
25
0
27 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
24
242
0
15 Apr 2020
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
XtremeDistil: Multi-stage Distillation for Massive Multilingual Models
Subhabrata Mukherjee
Ahmed Hassan Awadallah
16
56
0
12 Apr 2020
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model
  Compression
LadaBERT: Lightweight Adaptation of BERT through Hybrid Model Compression
Yihuan Mao
Yujing Wang
Chufan Wu
Chen Zhang
Yang-Feng Wang
Yaming Yang
Quanlu Zhang
Yunhai Tong
Jing Bai
22
72
0
08 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
20
319
0
08 Apr 2020
Structure-Level Knowledge Distillation For Multilingual Sequence
  Labeling
Structure-Level Knowledge Distillation For Multilingual Sequence Labeling
Xinyu Wang
Yong-jia Jiang
Nguyen Bach
Tao Wang
Fei Huang
Kewei Tu
28
36
0
08 Apr 2020
On the Effect of Dropping Layers of Pre-trained Transformer Models
On the Effect of Dropping Layers of Pre-trained Transformer Models
Hassan Sajjad
Fahim Dalvi
Nadir Durrani
Preslav Nakov
31
132
0
08 Apr 2020
Towards Non-task-specific Distillation of BERT via Sentence
  Representation Approximation
Towards Non-task-specific Distillation of BERT via Sentence Representation Approximation
Bowen Wu
Huan Zhang
Mengyuan Li
Zongsheng Wang
Qihang Feng
Junhong Huang
Baoxun Wang
14
4
0
07 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
22
797
0
06 Apr 2020
FastBERT: a Self-distilling BERT with Adaptive Inference Time
FastBERT: a Self-distilling BERT with Adaptive Inference Time
Weijie Liu
Peng Zhou
Zhe Zhao
Zhiruo Wang
Haotang Deng
Qi Ju
57
354
0
05 Apr 2020
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Meta Fine-Tuning Neural Language Models for Multi-Domain Text Mining
Chengyu Wang
Minghui Qiu
Jun Huang
Xiaofeng He
AI4CE
34
24
0
29 Mar 2020
Pre-trained Models for Natural Language Processing: A Survey
Pre-trained Models for Natural Language Processing: A Survey
Xipeng Qiu
Tianxiang Sun
Yige Xu
Yunfan Shao
Ning Dai
Xuanjing Huang
LM&MA
VLM
243
1,452
0
18 Mar 2020
A Survey on Contextual Embeddings
A Survey on Contextual Embeddings
Qi Liu
Matt J. Kusner
Phil Blunsom
225
146
0
16 Mar 2020
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language
  Understanding
TRANS-BLSTM: Transformer with Bidirectional LSTM for Language Understanding
Zhiheng Huang
Peng Xu
Davis Liang
Ajay K. Mishra
Bing Xiang
15
31
0
16 Mar 2020
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural
  Machine Translation
Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation
Mitchell A. Gordon
Kevin Duh
CLL
VLM
26
13
0
05 Mar 2020
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural
  Language Processing
TextBrewer: An Open-Source Knowledge Distillation Toolkit for Natural Language Processing
Ziqing Yang
Yiming Cui
Zhipeng Chen
Wanxiang Che
Ting Liu
Shijin Wang
Guoping Hu
VLM
14
47
0
28 Feb 2020
A Primer in BERTology: What we know about how BERT works
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
35
1,461
0
27 Feb 2020
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Compressing Large-Scale Transformer-Based Models: A Case Study on BERT
Prakhar Ganesh
Yao Chen
Xin Lou
Mohammad Ali Khan
Yifan Yang
Hassan Sajjad
Preslav Nakov
Deming Chen
Marianne Winslett
AI4CE
21
197
0
27 Feb 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training
  and Inference of Transformers
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
22
148
0
26 Feb 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
47
1,209
0
25 Feb 2020
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Improving BERT Fine-Tuning via Self-Ensemble and Self-Distillation
Yige Xu
Xipeng Qiu
L. Zhou
Xuanjing Huang
17
65
0
24 Feb 2020
ScopeIt: Scoping Task Relevant Sentences in Documents
ScopeIt: Scoping Task Relevant Sentences in Documents
Vishwas Suryanarayanan
Barun Patra
P. Bhattacharya
C. Fufa
Charles Lee
17
4
0
23 Feb 2020
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for
  Efficient Retrieval
TwinBERT: Distilling Knowledge to Twin-Structured BERT Models for Efficient Retrieval
Wenhao Lu
Jian Jiao
Ruofei Zhang
15
50
0
14 Feb 2020
Subclass Distillation
Subclass Distillation
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
28
33
0
10 Feb 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
229
197
0
07 Feb 2020
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector
  Elimination
PoWER-BERT: Accelerating BERT Inference via Progressive Word-vector Elimination
Saurabh Goyal
Anamitra R. Choudhury
Saurabh ManishRaje
Venkatesan T. Chakaravarthy
Yogish Sabharwal
Ashish Verma
20
18
0
24 Jan 2020
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural
  Architecture Search
AdaBERT: Task-Adaptive BERT Compression with Differentiable Neural Architecture Search
Daoyuan Chen
Yaliang Li
Minghui Qiu
Zhen Wang
Bofang Li
Bolin Ding
Hongbo Deng
Jun Huang
Wei Lin
Jingren Zhou
MQ
24
104
0
13 Jan 2020
The State of Knowledge Distillation for Classification
The State of Knowledge Distillation for Classification
Fabian Ruffy
K. Chahal
30
20
0
20 Dec 2019
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
WaLDORf: Wasteless Language-model Distillation On Reading-comprehension
J. Tian
A. Kreuzer
Pai-Hung Chen
Hans-Martin Will
VLM
39
3
0
13 Dec 2019
Unsupervised Pre-training for Natural Language Generation: A Literature
  Review
Unsupervised Pre-training for Natural Language Generation: A Literature Review
Yuanxin Liu
Zheng Lin
SSL
AI4CE
33
3
0
13 Nov 2019
Distilling Knowledge Learned in BERT for Text Generation
Distilling Knowledge Learned in BERT for Text Generation
Yen-Chun Chen
Zhe Gan
Yu Cheng
Jingzhou Liu
Jingjing Liu
18
28
0
10 Nov 2019
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained
  Language Models
MKD: a Multi-Task Knowledge Distillation Approach for Pretrained Language Models
Linqing Liu
Haiquan Wang
Jimmy J. Lin
R. Socher
Caiming Xiong
9
21
0
09 Nov 2019
Structured Pruning of Large Language Models
Structured Pruning of Large Language Models
Ziheng Wang
Jeremy Wohlwend
Tao Lei
24
281
0
10 Oct 2019
Knowledge Distillation from Internal Representations
Knowledge Distillation from Internal Representations
Gustavo Aguilar
Yuan Ling
Yu Zhang
Benjamin Yao
Xing Fan
Edward Guo
33
178
0
08 Oct 2019
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
Distilling BERT into Simple Neural Networks with Unlabeled Transfer Data
Subhabrata Mukherjee
Ahmed Hassan Awadallah
20
25
0
04 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
106
6,377
0
26 Sep 2019
Extremely Small BERT Models from Mixed-Vocabulary Training
Extremely Small BERT Models from Mixed-Vocabulary Training
Sanqiang Zhao
Raghav Gupta
Yang Song
Denny Zhou
VLM
8
53
0
25 Sep 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
11
1,819
0
23 Sep 2019
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution
  Pruning
DDPNAS: Efficient Neural Architecture Search via Dynamic Distribution Pruning
Xiawu Zheng
Chenyi Yang
Shaokun Zhang
Yan Wang
Baochang Zhang
Yongjian Wu
Yunsheng Wu
Ling Shao
Rongrong Ji
40
21
0
28 May 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
A Survey of Model Compression and Acceleration for Deep Neural Networks
A Survey of Model Compression and Acceleration for Deep Neural Networks
Yu Cheng
Duo Wang
Pan Zhou
Zhang Tao
40
1,087
0
23 Oct 2017
Previous
123...1089