Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.09355
Cited By
Patient Knowledge Distillation for BERT Model Compression
25 August 2019
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Patient Knowledge Distillation for BERT Model Compression"
50 / 492 papers shown
Title
EM-Network: Oracle Guided Self-distillation for Sequence Learning
J. Yoon
Sunghwan Ahn
Hyeon Seung Lee
Minchan Kim
Seokhwan Kim
N. Kim
VLM
32
2
0
14 Jun 2023
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng Zhang
Jie Tang
ALM
MoE
33
11
0
11 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng Zhang
Jie Tang
VLM
30
1
0
11 Jun 2023
Modular Transformers: Compressing Transformers into Modularized Layers for Flexible Efficient Inference
Wangchunshu Zhou
Ronan Le Bras
Yejin Choi
15
0
0
04 Jun 2023
LLMs Can Understand Encrypted Prompt: Towards Privacy-Computing Friendly Transformers
Xuanqing Liu
Zhuotao Liu
16
22
0
28 May 2023
A Study on Knowledge Distillation from Weak Teacher for Scaling Up Pre-trained Language Models
Hayeon Lee
Rui Hou
Jongpil Kim
Davis Liang
Sung Ju Hwang
Alexander Min
17
7
0
26 May 2023
SmartTrim: Adaptive Tokens and Attention Pruning for Efficient Vision-Language Models
Zekun Wang
Jingchang Chen
Wangchunshu Zhou
Haichao Zhu
Jiafeng Liang
Liping Shan
Ming Liu
Dongliang Xu
Qing Yang
Bing Qin
VLM
26
4
0
24 May 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
28
8
0
24 May 2023
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
30
3
0
24 May 2023
PruMUX: Augmenting Data Multiplexing with Model Compression
Yushan Su
Vishvak Murahari
Karthik R. Narasimhan
Keqin Li
25
3
0
24 May 2023
Understanding the Effect of Data Augmentation on Knowledge Distillation
Ziqi Wang
Chi Han
Wenxuan Bao
Heng Ji
21
2
0
21 May 2023
F-PABEE: Flexible-patience-based Early Exiting for Single-label and Multi-label text Classification Tasks
Xiangxiang Gao
Wei-wei Zhu
Jiasheng Gao
Congrui Yin
VLM
26
12
0
21 May 2023
Pruning Pre-trained Language Models with Principled Importance and Self-regularization
Siyu Ren
Kenny Q. Zhu
30
2
0
21 May 2023
Task-agnostic Distillation of Encoder-Decoder Language Models
Chen Zhang
Yang Yang
Jingang Wang
Dawei Song
33
4
0
21 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
DisCo: Distilled Student Models Co-training for Semi-supervised Text Mining
Weifeng Jiang
Qianren Mao
Chenghua Lin
Jianxin Li
Ting Deng
Weiyi Yang
Zihan Wang
18
2
0
20 May 2023
LLM-Pruner: On the Structural Pruning of Large Language Models
Xinyin Ma
Gongfan Fang
Xinchao Wang
32
366
0
19 May 2023
AD-KD: Attribution-Driven Knowledge Distillation for Language Model Compression
Siyue Wu
Hongzhan Chen
Xiaojun Quan
Qifan Wang
Rui-cang Wang
VLM
22
18
0
17 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
27
7
0
16 May 2023
Weight-Inherited Distillation for Task-Agnostic BERT Compression
Taiqiang Wu
Cheng-An Hou
Shanshan Lao
Jiayi Li
Ngai Wong
Zhe Zhao
Yujiu Yang
71
10
0
16 May 2023
Recyclable Tuning for Continual Pre-training
Yujia Qin
Cheng Qian
Xu Han
Yankai Lin
Huadong Wang
Ruobing Xie
Zhiyuan Liu
Maosong Sun
Jie Zhou
CLL
31
11
0
15 May 2023
ProKnow: Process Knowledge for Safety Constrained and Explainable Question Generation for Mental Health Diagnostic Assistance
Kaushik Roy
Manas Gaur
Misagh Soltani
Vipula Rawte
Ashwin Kalyan
Amit P. Sheth
AI4MH
38
27
0
13 May 2023
Processing Natural Language on Embedded Devices: How Well Do Transformer Models Perform?
Souvik Sarkar
Mohammad Fakhruddin Babar
Md. Mahadi Hassan
M. Hasan
Shubhra (Santu) Karmaker
17
1
0
23 Apr 2023
Transformer-based models and hardware acceleration analysis in autonomous driving: A survey
J. Zhong
Zheng Liu
Xiangshan Chen
ViT
44
17
0
21 Apr 2023
Word Sense Induction with Knowledge Distillation from BERT
Anik Saha
Alex Gittens
B. Yener
20
1
0
20 Apr 2023
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Jiahao Wang
Songyang Zhang
Yong Liu
Taiqiang Wu
Yujiu Yang
Xihui Liu
Kai-xiang Chen
Ping Luo
Dahua Lin
39
20
0
12 Apr 2023
Vision Models Can Be Efficiently Specialized via Few-Shot Task-Aware Compression
Denis Kuznedelev
Soroush Tabesh
Kimia Noorbakhsh
Elias Frantar
Sara Beery
Eldar Kurtic
Dan Alistarh
MQ
VLM
26
2
0
25 Mar 2023
Heterogeneous-Branch Collaborative Learning for Dialogue Generation
Yiwei Li
Shaoxiong Feng
Bin Sun
Kan Li
32
3
0
21 Mar 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
35
7
0
16 Mar 2023
Gradient-Free Structured Pruning with Unlabeled Data
Azade Nova
H. Dai
Dale Schuurmans
SyDa
40
20
0
07 Mar 2023
Students Parrot Their Teachers: Membership Inference on Model Distillation
Matthew Jagielski
Milad Nasr
Christopher A. Choquette-Choo
Katherine Lee
Nicholas Carlini
FedML
41
21
0
06 Mar 2023
BPT: Binary Point Cloud Transformer for Place Recognition
Zhixing Hou
Yuzhang Shang
Tian Gao
Yan Yan
MQ
ViT
37
3
0
02 Mar 2023
Practical Knowledge Distillation: Using DNNs to Beat DNNs
Chungman Lee
Pavlos Anastasios Apostolopulos
Igor L. Markov
FedML
27
1
0
23 Feb 2023
Exploring Social Media for Early Detection of Depression in COVID-19 Patients
Jiageng Wu
Xian Wu
Yining Hua
Shixu Lin
Yefeng Zheng
Jie Yang
25
19
0
23 Feb 2023
Teacher Intervention: Improving Convergence of Quantization Aware Training for Ultra-Low Precision Transformers
Minsoo Kim
Kyuhong Shim
Seongmin Park
Wonyong Sung
Jungwook Choi
MQ
19
1
0
23 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
Stitchable Neural Networks
Zizheng Pan
Jianfei Cai
Bohan Zhuang
53
22
0
13 Feb 2023
Lightweight Transformers for Clinical Natural Language Processing
Omid Rohanian
Mohammadmahdi Nouriborji
Hannah Jauncey
Samaneh Kouchaki
Isaric Clinical Characterisation Group
Lei A. Clifton
L. Merson
David A. Clifton
MedIm
LM&MA
24
12
0
09 Feb 2023
ZipLM: Inference-Aware Structured Pruning of Language Models
Eldar Kurtic
Elias Frantar
Dan Alistarh
MQ
25
24
0
07 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
23
6
0
03 Feb 2023
idT5: Indonesian Version of Multilingual T5 Transformer
Mukhlish Fuadi
A. Wibawa
S. Sumpeno
19
6
0
02 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
34
8
0
01 Feb 2023
Knowledge Distillation on Graphs: A Survey
Yijun Tian
Shichao Pei
Xiangliang Zhang
Chuxu Zhang
Nitesh V. Chawla
21
28
0
01 Feb 2023
Understanding Self-Distillation in the Presence of Label Noise
Rudrajit Das
Sujay Sanghavi
38
13
0
30 Jan 2023
Understanding INT4 Quantization for Transformer Models: Latency Speedup, Composability, and Failure Cases
Xiaoxia Wu
Cheng-rong Li
Reza Yazdani Aminabadi
Z. Yao
Yuxiong He
MQ
19
19
0
27 Jan 2023
Improved knowledge distillation by utilizing backward pass knowledge in neural networks
A. Jafari
Mehdi Rezagholizadeh
A. Ghodsi
19
1
0
27 Jan 2023
Effective End-to-End Vision Language Pretraining with Semantic Visual Loss
Xiaofeng Yang
Fayao Liu
Guosheng Lin
VLM
26
7
0
18 Jan 2023
ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization
Weixin Liu
Xuyi Chen
Jiaxiang Liu
Shi Feng
Yu Sun
Hao Tian
Hua Wu
29
1
0
09 Jan 2023
Cramming: Training a Language Model on a Single GPU in One Day
Jonas Geiping
Tom Goldstein
MoE
30
85
0
28 Dec 2022
Prototype-guided Cross-task Knowledge Distillation for Large-scale Models
Deng Li
Aming Wu
Yahong Han
Qingwen Tian
VLM
33
2
0
26 Dec 2022
Previous
1
2
3
4
5
...
8
9
10
Next