Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2109.11295
Cited By
Dynamic Knowledge Distillation for Pre-trained Language Models
23 September 2021
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Dynamic Knowledge Distillation for Pre-trained Language Models"
32 / 32 papers shown
Title
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
Yi Zhang
Guangyou Zhou
Zhiwen Xie
Jinjin Ma
Jimmy Xiangji Huang
AIMat
43
4
0
08 Jan 2025
Adrenaline: Adaptive Rendering Optimization System for Scalable Cloud Gaming
Jin Heo
Ketan Bhardwaj
Ada Gavrilovska
29
1
0
27 Dec 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
67
0
0
25 Nov 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
57
7
0
19 Jun 2024
GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation
Wenjie Zhou
Zhenxin Ding
Xiaodong Zhang
Haibo Shi
Junfeng Wang
Dawei Yin
41
0
0
06 May 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
33
2
0
03 Mar 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
37
51
0
01 Mar 2024
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu
Richard He Bai
Aonan Zhang
Jiatao Gu
V. Vydiswaran
Navdeep Jaitly
Yizhe Zhang
LRM
37
6
0
22 Feb 2024
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
33
0
0
11 Oct 2023
Bridging the Gap between Decision and Logits in Decision-based Knowledge Distillation for Pre-trained Language Models
Qinhong Zhou
Zonghan Yang
Peng Li
Yang Liu
28
3
0
15 Jun 2023
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
ALM
MoE
33
11
0
11 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
VLM
27
1
0
11 Jun 2023
Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
Yi Liu
Xiaohan Bi
Lei Li
Sishuo Chen
Wenkai Yang
Xu Sun
FedML
35
12
0
21 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
24
7
0
16 May 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
34
8
0
01 Feb 2023
Knowledge Distillation
≈
\approx
≈
Label Smoothing: Fact or Fallacy?
Md Arafat Sultan
22
2
0
30 Jan 2023
ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization
Weixin Liu
Xuyi Chen
Jiaxiang Liu
Shi Feng
Yu Sun
Hao Tian
Hua Wu
27
1
0
09 Jan 2023
Hint-dynamic Knowledge Distillation
Yiyang Liu
Chenxin Li
Xiaotong Tu
Xinghao Ding
Yue Huang
14
1
0
30 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
32
162
0
14 Oct 2022
From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
VLM
18
1
0
11 Oct 2022
Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy
Jingru Li
Sheng Zhou
Liangcheng Li
Haishuai Wang
Zhi Yu
Jiajun Bu
34
14
0
29 Aug 2022
Distributional Correlation--Aware Knowledge Distillation for Stock Trading Volume Prediction
Lei Li
Zhiyuan Zhang
Ruihan Bao
Keiko Harimoto
Xu Sun
17
3
0
04 Aug 2022
Dynamic Contrastive Distillation for Image-Text Retrieval
Jun Rao
Liang Ding
Shuhan Qi
Meng Fang
Yang Liu
Liqiong Shen
Dacheng Tao
VLM
61
30
0
04 Jul 2022
Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers
Sowmya S. Sundaram
Sairam Gurajada
M. Fisichella
Deepak P
Savitha Sam Abraham
ReLM
26
11
0
31 May 2022
MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang
Yang Yang
Qifan Wang
Jiahao Liu
Jingang Wang
Wei Wu
Dawei Song
47
4
0
29 May 2022
Efficient Sub-structured Knowledge Distillation
Wenye Lin
Yangming Li
Lemao Liu
Shuming Shi
Haitao Zheng
12
1
0
09 Mar 2022
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
MoMe
24
2
0
14 Dec 2021
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
243
290
0
17 Mar 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
285
9,145
0
06 Jun 2015
1