ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1903.12136
  4. Cited By
Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

Distilling Task-Specific Knowledge from BERT into Simple Neural Networks

28 March 2019
Raphael Tang
Yao Lu
Linqing Liu
Lili Mou
Olga Vechtomova
Jimmy J. Lin
ArXivPDFHTML

Papers citing "Distilling Task-Specific Knowledge from BERT into Simple Neural Networks"

50 / 81 papers shown
Title
Mitigating Catastrophic Forgetting in the Incremental Learning of Medical Images
Mitigating Catastrophic Forgetting in the Incremental Learning of Medical Images
Sara Yavari
Jacob Furst
CLL
58
0
0
28 Apr 2025
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Honey, I Shrunk the Language Model: Impact of Knowledge Distillation Methods on Performance and Explainability
Daniel Hendriks
Philipp Spitzer
Niklas Kühl
G. Satzger
27
1
0
22 Apr 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity
Mutian He
Philip N. Garner
82
0
0
09 Oct 2024
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Can't Hide Behind the API: Stealing Black-Box Commercial Embedding Models
Manveer Singh Tamber
Jasper Xian
Jimmy Lin
MLAU
SILM
175
0
0
13 Jun 2024
Augmenting Offline RL with Unlabeled Data
Augmenting Offline RL with Unlabeled Data
Zhao Wang
Briti Gangopadhyay
Jia-Fong Yeh
Shingo Takamatsu
OffRL
28
0
0
11 Jun 2024
Integrating Domain Knowledge for handling Limited Data in Offline RL
Integrating Domain Knowledge for handling Limited Data in Offline RL
Briti Gangopadhyay
Zhao Wang
Jia-Fong Yeh
Shingo Takamatsu
OffRL
32
0
0
11 Jun 2024
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge
  Distillation
Sentence-Level or Token-Level? A Comprehensive Study on Knowledge Distillation
Jingxuan Wei
Linzhuang Sun
Yichong Leng
Xu Tan
Bihui Yu
Ruifeng Guo
51
3
0
23 Apr 2024
Task Integration Distillation for Object Detectors
Task Integration Distillation for Object Detectors
Hai Su
ZhenWen Jian
Songsen Yu
46
1
0
02 Apr 2024
Teaching MLP More Graph Information: A Three-stage Multitask Knowledge
  Distillation Framework
Teaching MLP More Graph Information: A Three-stage Multitask Knowledge Distillation Framework
Junxian Li
Bin Shi
Erfei Cui
Hua Wei
Qinghua Zheng
49
0
0
02 Mar 2024
Confidence Preservation Property in Knowledge Distillation Abstractions
Confidence Preservation Property in Knowledge Distillation Abstractions
Dmitry Vengertsev
Elena Sherman
38
0
0
21 Jan 2024
Mixed Distillation Helps Smaller Language Model Better Reasoning
Mixed Distillation Helps Smaller Language Model Better Reasoning
Chenglin Li
Qianglong Chen
Liangyue Li
Wang Caiyu
Yicheng Li
Zhang Yin
Yin Zhang
LRM
41
12
0
17 Dec 2023
Teacher-Student Architecture for Knowledge Distillation: A Survey
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Xue Liu
21
16
0
08 Aug 2023
Accurate Retraining-free Pruning for Pretrained Encoder-based Language
  Models
Accurate Retraining-free Pruning for Pretrained Encoder-based Language Models
Seungcheol Park
Ho-Jin Choi
U. Kang
VLM
40
5
0
07 Aug 2023
f-Divergence Minimization for Sequence-Level Knowledge Distillation
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
30
53
0
27 Jul 2023
Vesper: A Compact and Effective Pretrained Model for Speech Emotion
  Recognition
Vesper: A Compact and Effective Pretrained Model for Speech Emotion Recognition
Weidong Chen
Xiaofen Xing
Peihao Chen
Xiangmin Xu
VLM
30
35
0
20 Jul 2023
Just CHOP: Embarrassingly Simple LLM Compression
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
30
3
0
24 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge
  Distillation
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
27
7
0
16 May 2023
Elementwise Language Representation
Elementwise Language Representation
Du-Yeong Kim
Jeeeun Kim
36
0
0
27 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
Distillation of encoder-decoder transformers for sequence labelling
Distillation of encoder-decoder transformers for sequence labelling
M. Farina
D. Pappadopulo
Anant Gupta
Leslie Huang
Ozan Irsoy
Thamar Solorio
VLM
103
3
0
10 Feb 2023
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at
  Industrial Scale
SpeechNet: Weakly Supervised, End-to-End Speech Recognition at Industrial Scale
Raphael Tang
K. Kumar
Gefei Yang
Akshat Pandey
Yajie Mao
Vladislav Belyaev
Madhuri Emmadi
Craig Murray
Ferhan Ture
Jimmy J. Lin
27
4
0
21 Nov 2022
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight
  BERT
Fast and Accurate FSA System Using ELBERT: An Efficient and Lightweight BERT
Siyuan Lu
Chenchen Zhou
Keli Xie
Jun Lin
Zhongfeng Wang
24
1
0
16 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
Augmentation with Projection: Towards an Effective and Efficient Data
  Augmentation Paradigm for Distillation
Augmentation with Projection: Towards an Effective and Efficient Data Augmentation Paradigm for Distillation
Ziqi Wang
Yuexin Wu
Frederick Liu
Daogao Liu
Le Hou
Hongkun Yu
Jing Li
Heng Ji
40
5
0
21 Oct 2022
An Effective, Performant Named Entity Recognition System for Noisy
  Business Telephone Conversation Transcripts
An Effective, Performant Named Entity Recognition System for Noisy Business Telephone Conversation Transcripts
Xue-Yong Fu
Cheng Chen
Md Tahmid Rahman Laskar
TN ShashiBhushan
Simon Corston-Oliver
38
6
0
27 Sep 2022
Building an Efficiency Pipeline: Commutativity and Cumulativeness of
  Efficiency Operators for Transformers
Building an Efficiency Pipeline: Commutativity and Cumulativeness of Efficiency Operators for Transformers
Ji Xin
Raphael Tang
Zhiying Jiang
Yaoliang Yu
Jimmy J. Lin
20
1
0
31 Jul 2022
SDBERT: SparseDistilBERT, a faster and smaller BERT model
SDBERT: SparseDistilBERT, a faster and smaller BERT model
Devaraju Vinoda
P. K. Yadav
VLM
MQ
17
0
0
28 Jul 2022
Chemical transformer compression for accelerating both training and
  inference of molecular modeling
Chemical transformer compression for accelerating both training and inference of molecular modeling
Yi Yu
K. Börjesson
24
0
0
16 May 2022
Adaptable Adapters
Adaptable Adapters
N. Moosavi
Quentin Delfosse
Kristian Kersting
Iryna Gurevych
53
21
0
03 May 2022
Attention Mechanism with Energy-Friendly Operations
Attention Mechanism with Energy-Friendly Operations
Boyi Deng
Baosong Yang
Dayiheng Liu
Rong Xiao
Derek F. Wong
Haibo Zhang
Boxing Chen
Lidia S. Chao
MU
140
1
0
28 Apr 2022
A Review on Text-Based Emotion Detection -- Techniques, Applications,
  Datasets, and Future Directions
A Review on Text-Based Emotion Detection -- Techniques, Applications, Datasets, and Future Directions
Sheetal Kusal
S. Patil
J. Choudrie
K. Kotecha
D. Vora
I. Pappas
19
24
0
26 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge
  Distillation
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
33
6
0
15 Apr 2022
Delta Keyword Transformer: Bringing Transformers to the Edge through
  Dynamically Pruned Multi-Head Self-Attention
Delta Keyword Transformer: Bringing Transformers to the Edge through Dynamically Pruned Multi-Head Self-Attention
Zuzana Jelčicová
Marian Verhelst
28
5
0
20 Mar 2022
VIRT: Improving Representation-based Models for Text Matching through
  Virtual Interaction
VIRT: Improving Representation-based Models for Text Matching through Virtual Interaction
Dan Li
Yang Yang
Hongyin Tang
Jingang Wang
Tong Xu
Wei Wu
Enhong Chen
27
7
0
08 Dec 2021
Sparse Distillation: Speeding Up Text Classification by Using Bigger
  Student Models
Sparse Distillation: Speeding Up Text Classification by Using Bigger Student Models
Qinyuan Ye
Madian Khabsa
M. Lewis
Sinong Wang
Xiang Ren
Aaron Jaech
37
5
0
16 Oct 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context
  Prediction Network
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
34
3
0
22 Sep 2021
Block Pruning For Faster Transformers
Block Pruning For Faster Transformers
François Lagunas
Ella Charlaix
Victor Sanh
Alexander M. Rush
VLM
21
219
0
10 Sep 2021
FedKD: Communication Efficient Federated Learning via Knowledge
  Distillation
FedKD: Communication Efficient Federated Learning via Knowledge Distillation
Chuhan Wu
Fangzhao Wu
Lingjuan Lyu
Yongfeng Huang
Xing Xie
FedML
27
373
0
30 Aug 2021
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Student Surpasses Teacher: Imitation Attack for Black-Box NLP APIs
Qiongkai Xu
Xuanli He
Lingjuan Lyu
Lizhen Qu
Gholamreza Haffari
MLAU
40
22
0
29 Aug 2021
Trustworthy AI: A Computational Perspective
Trustworthy AI: A Computational Perspective
Haochen Liu
Yiqi Wang
Wenqi Fan
Xiaorui Liu
Yaxin Li
Shaili Jain
Yunhao Liu
Anil K. Jain
Jiliang Tang
FaML
104
196
0
12 Jul 2021
Learned Token Pruning for Transformers
Learned Token Pruning for Transformers
Sehoon Kim
Sheng Shen
D. Thorsley
A. Gholami
Woosuk Kwon
Joseph Hassoun
Kurt Keutzer
17
146
0
02 Jul 2021
Knowledge Distillation for Quality Estimation
Knowledge Distillation for Quality Estimation
Amit Gajbhiye
M. Fomicheva
Fernando Alva-Manchego
Frédéric Blain
A. Obamuyide
Nikolaos Aletras
Lucia Specia
24
11
0
01 Jul 2021
Elbert: Fast Albert with Confidence-Window Based Early Exit
Elbert: Fast Albert with Confidence-Window Based Early Exit
Keli Xie
Siyuan Lu
Meiqi Wang
Zhongfeng Wang
14
20
0
01 Jul 2021
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
XtremeDistilTransformers: Task Transfer for Task-agnostic Distillation
Subhabrata Mukherjee
Ahmed Hassan Awadallah
Jianfeng Gao
19
22
0
08 Jun 2021
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
GPT3Mix: Leveraging Large-scale Language Models for Text Augmentation
Kang Min Yoo
Dongju Park
Jaewook Kang
Sang-Woo Lee
Woomyeong Park
36
235
0
18 Apr 2021
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning
  Architectures
The NLP Cookbook: Modern Recipes for Transformer based Deep Learning Architectures
Sushant Singh
A. Mahmood
AI4TS
60
92
0
23 Mar 2021
Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation
Learning to Augment for Data-Scarce Domain BERT Knowledge Distillation
Lingyun Feng
Minghui Qiu
Yaliang Li
Haitao Zheng
Ying Shen
43
10
0
20 Jan 2021
I-BERT: Integer-only BERT Quantization
I-BERT: Integer-only BERT Quantization
Sehoon Kim
A. Gholami
Z. Yao
Michael W. Mahoney
Kurt Keutzer
MQ
105
341
0
05 Jan 2021
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for
  Natural Language Understanding
LRC-BERT: Latent-representation Contrastive Knowledge Distillation for Natural Language Understanding
Hao Fu
Shaojun Zhou
Qihong Yang
Junjie Tang
Guiquan Liu
Kaikui Liu
Xiaolong Li
46
57
0
14 Dec 2020
12
Next