ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09355
  4. Cited By
Patient Knowledge Distillation for BERT Model Compression

Patient Knowledge Distillation for BERT Model Compression

25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
ArXivPDFHTML

Papers citing "Patient Knowledge Distillation for BERT Model Compression"

50 / 492 papers shown
Title
Knowledge Distillation of Russian Language Models with Reduction of
  Vocabulary
Knowledge Distillation of Russian Language Models with Reduction of Vocabulary
A. Kolesnikova
Yuri Kuratov
Vasily Konovalov
Andrey Kravchenko
VLM
32
10
0
04 May 2022
EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language
  Processing
EasyNLP: A Comprehensive and Easy-to-use Toolkit for Natural Language Processing
Chengyu Wang
Minghui Qiu
Chen Shi
Taolin Zhang
Tingting Liu
Lei Li
Jie Wang
Ming Wang
Jun Huang
W. Lin
24
21
0
30 Apr 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
150
0
27 Apr 2022
Ultra Fast Speech Separation Model with Teacher Student Learning
Ultra Fast Speech Separation Model with Teacher Student Learning
Sanyuan Chen
Yu-Huan Wu
Zhuo Chen
Jian Wu
Takuya Yoshioka
Shujie Liu
Jinyu Li
Xiangzhan Yu
23
14
0
27 Apr 2022
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for
  Vision-Language Tasks
Multimodal Adaptive Distillation for Leveraging Unimodal Encoders for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Xiyang Dai
...
Jianwei Yang
Haoxuan You
Kai-Wei Chang
Shih-Fu Chang
Lu Yuan
VLM
OffRL
31
22
0
22 Apr 2022
ALBETO and DistilBETO: Lightweight Spanish Language Models
ALBETO and DistilBETO: Lightweight Spanish Language Models
J. Canete
S. Donoso
Felipe Bravo-Marquez
Andrés Carvallo
Vladimir Araujo
48
20
0
19 Apr 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided
  Adaptation
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
24
38
0
15 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
35
6
0
15 Apr 2022
Structured Pruning Learns Compact and Accurate Models
Structured Pruning Learns Compact and Accurate Models
Mengzhou Xia
Zexuan Zhong
Danqi Chen
VLM
11
180
0
01 Apr 2022
Feature Structure Distillation with Centered Kernel Alignment in BERT
  Transferring
Feature Structure Distillation with Centered Kernel Alignment in BERT Transferring
Heeseung Jung
Doyeon Kim
Seung-Hoon Na
Kangil Kim
27
5
0
01 Apr 2022
Distill-VQ: Learning Retrieval Oriented Vector Quantization By
  Distilling Knowledge from Dense Embeddings
Distill-VQ: Learning Retrieval Oriented Vector Quantization By Distilling Knowledge from Dense Embeddings
Shitao Xiao
Zheng Liu
Weihao Han
Jianjin Zhang
Defu Lian
...
Hao Sun
Yingxia Shao
Denvy Deng
Qi Zhang
Xing Xie
28
33
0
01 Apr 2022
Self-Distillation from the Last Mini-Batch for Consistency
  Regularization
Self-Distillation from the Last Mini-Batch for Consistency Regularization
Yiqing Shen
Liwu Xu
Yuzhe Yang
Yaqian Li
Yandong Guo
17
62
0
30 Mar 2022
Graph Neural Networks in IoT: A Survey
Graph Neural Networks in IoT: A Survey
Guimin Dong
Mingyue Tang
Zhiyuan Wang
Jiechao Gao
Sikun Guo
Lihua Cai
Robert Gutierrez
Brad Campbell
Laura E. Barnes
M. Boukhechba
GNN
AI4CE
42
97
0
29 Mar 2022
A Fast Post-Training Pruning Framework for Transformers
A Fast Post-Training Pruning Framework for Transformers
Woosuk Kwon
Sehoon Kim
Michael W. Mahoney
Joseph Hassoun
Kurt Keutzer
A. Gholami
29
144
0
29 Mar 2022
Knowledge Distillation: Bad Models Can Be Good Role Models
Knowledge Distillation: Bad Models Can Be Good Role Models
Gal Kaplun
Eran Malach
Preetum Nakkiran
Shai Shalev-Shwartz
FedML
28
15
0
28 Mar 2022
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token
  Selection
Pyramid-BERT: Reducing Complexity via Successive Core-set based Token Selection
Xin Huang
A. Khetan
Rene Bidart
Zohar Karnin
19
14
0
27 Mar 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
Knowledge Amalgamation for Object Detection with Transformers
Knowledge Amalgamation for Object Detection with Transformers
Haofei Zhang
Feng Mao
Mengqi Xue
Gongfan Fang
Zunlei Feng
Mingli Song
Mingli Song
ViT
119
12
0
07 Mar 2022
A Simple Hash-Based Early Exiting Approach For Language Understanding
  and Generation
A Simple Hash-Based Early Exiting Approach For Language Understanding and Generation
Tianxiang Sun
Xiangyang Liu
Wei-wei Zhu
Zhichao Geng
Lingling Wu
Yilong He
Yuan Ni
Guotong Xie
Xuanjing Huang
Xipeng Qiu
37
40
0
03 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
29
4
0
24 Feb 2022
A Survey on Model Compression and Acceleration for Pretrained Language
  Models
A Survey on Model Compression and Acceleration for Pretrained Language Models
Canwen Xu
Julian McAuley
23
58
0
15 Feb 2022
Exploring Inter-Channel Correlation for Diversity-preserved
  KnowledgeDistillation
Exploring Inter-Channel Correlation for Diversity-preserved KnowledgeDistillation
Li Liu
Qingle Huang
Sihao Lin
Hongwei Xie
Bing Wang
Xiaojun Chang
Xiao-Xue Liang
28
100
0
08 Feb 2022
Improving Robustness by Enhancing Weak Subnets
Improving Robustness by Enhancing Weak Subnets
Yong Guo
David Stutz
Bernt Schiele
AAML
27
15
0
30 Jan 2022
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for
  Distilling Large Language Models
AutoDistil: Few-shot Task-agnostic Neural Architecture Search for Distilling Large Language Models
Dongkuan Xu
Subhabrata Mukherjee
Xiaodong Liu
Debadeepta Dey
Wenhui Wang
Xiang Zhang
Ahmed Hassan Awadallah
Jianfeng Gao
30
4
0
29 Jan 2022
AutoDistill: an End-to-End Framework to Explore and Distill
  Hardware-Efficient Language Models
AutoDistill: an End-to-End Framework to Explore and Distill Hardware-Efficient Language Models
Xiaofan Zhang
Zongwei Zhou
Deming Chen
Yu Emma Wang
26
11
0
21 Jan 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit
  Vision Transformer
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Yi Ding
Tianlong Chen
Xiaolong Ma
Zhangyang Wang
Yanzhi Wang
ViT
33
46
0
17 Jan 2022
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an
  Application to Question Answering Systems
Ensemble Transformer for Efficient and Accurate Ranking Tasks: an Application to Question Answering Systems
Yoshitomo Matsubara
Luca Soldaini
Eric Lind
Alessandro Moschitti
29
6
0
15 Jan 2022
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
CLIP-TD: CLIP Targeted Distillation for Vision-Language Tasks
Zhecan Wang
Noel Codella
Yen-Chun Chen
Luowei Zhou
Jianwei Yang
Xiyang Dai
Bin Xiao
Haoxuan You
Shih-Fu Chang
Lu Yuan
CLIP
VLM
22
39
0
15 Jan 2022
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to
  Power Next-Generation AI Scale
DeepSpeed-MoE: Advancing Mixture-of-Experts Inference and Training to Power Next-Generation AI Scale
Samyam Rajbhandari
Conglong Li
Z. Yao
Minjia Zhang
Reza Yazdani Aminabadi
A. A. Awan
Jeff Rasley
Yuxiong He
47
286
0
14 Jan 2022
GateFormer: Speeding Up News Feed Recommendation with Input Gated
  Transformers
GateFormer: Speeding Up News Feed Recommendation with Input Gated Transformers
Peitian Zhang
Zheng liu
AI4TS
30
1
0
12 Jan 2022
Conditional Generative Data-free Knowledge Distillation
Conditional Generative Data-free Knowledge Distillation
Xinyi Yu
Ling Yan
Yang Yang
Libo Zhou
Linlin Ou
20
8
0
31 Dec 2021
Data-Free Knowledge Transfer: A Survey
Data-Free Knowledge Transfer: A Survey
Yuang Liu
Wei Zhang
Jun Wang
Jianyong Wang
35
48
0
31 Dec 2021
Automatic Mixed-Precision Quantization Search of BERT
Automatic Mixed-Precision Quantization Search of BERT
Changsheng Zhao
Ting Hua
Yilin Shen
Qian Lou
Hongxia Jin
MQ
22
19
0
30 Dec 2021
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training
  for Language Understanding and Generation
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
39
79
0
23 Dec 2021
Distilled Dual-Encoder Model for Vision-Language Understanding
Distilled Dual-Encoder Model for Vision-Language Understanding
Zekun Wang
Wenhui Wang
Haichao Zhu
Ming Liu
Bing Qin
Furu Wei
VLM
FedML
29
30
0
16 Dec 2021
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
  Models
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
MoMe
24
2
0
14 Dec 2021
From Dense to Sparse: Contrastive Pruning for Better Pre-trained
  Language Model Compression
From Dense to Sparse: Contrastive Pruning for Better Pre-trained Language Model Compression
Runxin Xu
Fuli Luo
Chengyu Wang
Baobao Chang
Jun Huang
Songfang Huang
Fei Huang
VLM
27
25
0
14 Dec 2021
On the Compression of Natural Language Models
On the Compression of Natural Language Models
S. Damadi
30
0
0
13 Dec 2021
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence
  Embeddings
DistilCSE: Effective Knowledge Distillation For Contrastive Sentence Embeddings
Chaochen Gao
Xing Wu
Peng Wang
Jue Wang
Liangjun Zang
Zhongyuan Wang
Songlin Hu
20
3
0
10 Dec 2021
VIRT: Improving Representation-based Models for Text Matching through
  Virtual Interaction
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
Dan Li
Yang Yang
Hongyin Tang
Jingang Wang
Tong Xu
Wei Wu
Enhong Chen
27
7
0
08 Dec 2021
Causal Distillation for Language Models
Causal Distillation for Language Models
Zhengxuan Wu
Atticus Geiger
J. Rozner
Elisa Kreiss
Hanson Lu
Thomas Icard
Christopher Potts
Noah D. Goodman
92
25
0
05 Dec 2021
WiFi-based Multi-task Sensing
WiFi-based Multi-task Sensing
Xie Zhang
Chengpei Tang
Yasong An
Kang Yin
19
3
0
26 Nov 2021
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Hierarchical Knowledge Distillation for Dialogue Sequence Labeling
Shota Orihashi
Yoshihiro Yamazaki
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Ryo Masumura
22
0
0
22 Nov 2021
A Survey on Green Deep Learning
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
81
83
0
08 Nov 2021
Leveraging Advantages of Interactive and Non-Interactive Models for
  Vector-Based Cross-Lingual Information Retrieval
Leveraging Advantages of Interactive and Non-Interactive Models for Vector-Based Cross-Lingual Information Retrieval
Linlong Xu
Baosong Yang
Xiaoyu Lv
Tianchi Bi
Dayiheng Liu
Haibo Zhang
36
6
0
03 Nov 2021
Magic Pyramid: Accelerating Inference with Early Exiting and Token
  Pruning
Magic Pyramid: Accelerating Inference with Early Exiting and Token Pruning
Xuanli He
I. Keivanloo
Yi Xu
Xiang He
Belinda Zeng
Santosh Rajagopalan
Trishul Chilimbi
18
18
0
30 Oct 2021
NxMTransformer: Semi-Structured Sparsification for Natural Language
  Understanding via ADMM
NxMTransformer: Semi-Structured Sparsification for Natural Language Understanding via ADMM
Connor Holmes
Minjia Zhang
Yuxiong He
Bo Wu
37
18
0
28 Oct 2021
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Mosaicking to Distill: Knowledge Distillation from Out-of-Domain Data
Gongfan Fang
Yifan Bao
Mingli Song
Xinchao Wang
Don Xie
Chengchao Shen
Xiuming Zhang
43
44
0
27 Oct 2021
How and When Adversarial Robustness Transfers in Knowledge Distillation?
How and When Adversarial Robustness Transfers in Knowledge Distillation?
Rulin Shao
Ming Zhou
C. Bezemer
Cho-Jui Hsieh
AAML
27
17
0
22 Oct 2021
Ensemble ALBERT on SQuAD 2.0
Ensemble ALBERT on SQuAD 2.0
Shilun Li
Renee Li
Veronica Peng
MoE
11
6
0
19 Oct 2021
Previous
123...1056789
Next