ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2109.11295
  4. Cited By
Dynamic Knowledge Distillation for Pre-trained Language Models

Dynamic Knowledge Distillation for Pre-trained Language Models

23 September 2021
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
ArXivPDFHTML

Papers citing "Dynamic Knowledge Distillation for Pre-trained Language Models"

32 / 32 papers shown
Title
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
A Diversity-Enhanced Knowledge Distillation Model for Practical Math Word Problem Solving
Yi Zhang
Guangyou Zhou
Zhiwen Xie
Jinjin Ma
Jimmy Xiangji Huang
AIMat
43
4
0
08 Jan 2025
Adrenaline: Adaptive Rendering Optimization System for Scalable Cloud
  Gaming
Adrenaline: Adaptive Rendering Optimization System for Scalable Cloud Gaming
Jin Heo
Ketan Bhardwaj
Ada Gavrilovska
29
1
0
27 Dec 2024
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning
  Small Language Models
Dynamic Self-Distillation via Previous Mini-batches for Fine-tuning Small Language Models
Y. Fu
Yin Yu
Xiaotian Han
Runchao Li
Xianxuan Long
Haotian Yu
Pan Li
SyDa
67
0
0
25 Nov 2024
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Slice-Level Scheduling for High Throughput and Load Balanced LLM Serving
Ke Cheng
Wen Hu
Zhi Wang
Hongen Peng
Jianguo Li
Sheng Zhang
57
7
0
19 Jun 2024
GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced
  Distillation
GOVERN: Gradient Orientation Vote Ensemble for Multi-Teacher Reinforced Distillation
Wenjie Zhou
Zhenxin Ding
Xiaodong Zhang
Haibo Shi
Junfeng Wang
Dawei Yin
41
0
0
06 May 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge
  Distillation in Neural Machine Translation
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
33
2
0
03 Mar 2024
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of
  Large Vision-Language Models
Multimodal ArXiv: A Dataset for Improving Scientific Comprehension of Large Vision-Language Models
Lei Li
Yuqi Wang
Runxin Xu
Peiyi Wang
Xiachong Feng
Lingpeng Kong
Qi Liu
37
51
0
01 Mar 2024
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Divide-or-Conquer? Which Part Should You Distill Your LLM?
Zhuofeng Wu
Richard He Bai
Aonan Zhang
Jiatao Gu
V. Vydiswaran
Navdeep Jaitly
Yizhe Zhang
LRM
37
6
0
22 Feb 2024
Pit One Against Many: Leveraging Attention-head Embeddings for
  Parameter-efficient Multi-head Attention
Pit One Against Many: Leveraging Attention-head Embeddings for Parameter-efficient Multi-head Attention
Huiyin Xue
Nikolaos Aletras
33
0
0
11 Oct 2023
Bridging the Gap between Decision and Logits in Decision-based Knowledge
  Distillation for Pre-trained Language Models
Bridging the Gap between Decision and Logits in Decision-based Knowledge Distillation for Pre-trained Language Models
Qinhong Zhou
Zonghan Yang
Peng Li
Yang Liu
28
3
0
15 Jun 2023
GKD: A General Knowledge Distillation Framework for Large-scale
  Pre-trained Language Model
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
ALM
MoE
33
11
0
11 Jun 2023
Are Intermediate Layers and Labels Really Necessary? A General Language
  Model Distillation Method
Are Intermediate Layers and Labels Really Necessary? A General Language Model Distillation Method
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
VLM
27
1
0
11 Jun 2023
Communication Efficient Federated Learning for Multilingual Neural
  Machine Translation with Adapter
Communication Efficient Federated Learning for Multilingual Neural Machine Translation with Adapter
Yi Liu
Xiaohan Bi
Lei Li
Sishuo Chen
Wenkai Yang
Xu Sun
FedML
35
12
0
21 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge
  Distillation
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
24
7
0
16 May 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via
  Knowledge Selection
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection
Chenglong Wang
Yi Lu
Yongyu Mu
Yimin Hu
Tong Xiao
Jingbo Zhu
34
8
0
01 Feb 2023
Knowledge Distillation $\approx$ Label Smoothing: Fact or Fallacy?
Knowledge Distillation ≈\approx≈ Label Smoothing: Fact or Fallacy?
Md Arafat Sultan
22
2
0
30 Jan 2023
ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic
  Distillation Generalization
ERNIE 3.0 Tiny: Frustratingly Simple Method to Improve Task-Agnostic Distillation Generalization
Weixin Liu
Xuyi Chen
Jiaxiang Liu
Shi Feng
Yu Sun
Hao Tian
Hua Wu
27
1
0
09 Jan 2023
Hint-dynamic Knowledge Distillation
Hint-dynamic Knowledge Distillation
Yiyang Liu
Chenxin Li
Xiaotong Tu
Xinghao Ding
Yue Huang
14
1
0
30 Nov 2022
Gradient Knowledge Distillation for Pre-trained Language Models
Gradient Knowledge Distillation for Pre-trained Language Models
Lean Wang
Lei Li
Xu Sun
VLM
23
5
0
02 Nov 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic
  Search-Free Low-Rank Adaptation
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
32
162
0
14 Oct 2022
From Mimicking to Integrating: Knowledge Integration for Pre-Trained
  Language Models
From Mimicking to Integrating: Knowledge Integration for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
VLM
18
1
0
11 Oct 2022
Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning
  Strategy
Dynamic Data-Free Knowledge Distillation by Easy-to-Hard Learning Strategy
Jingru Li
Sheng Zhou
Liangcheng Li
Haishuai Wang
Zhi Yu
Jiajun Bu
34
14
0
29 Aug 2022
Distributional Correlation--Aware Knowledge Distillation for Stock
  Trading Volume Prediction
Distributional Correlation--Aware Knowledge Distillation for Stock Trading Volume Prediction
Lei Li
Zhiyuan Zhang
Ruihan Bao
Keiko Harimoto
Xu Sun
17
3
0
04 Aug 2022
Dynamic Contrastive Distillation for Image-Text Retrieval
Dynamic Contrastive Distillation for Image-Text Retrieval
Jun Rao
Liang Ding
Shuhan Qi
Meng Fang
Yang Liu
Liqiong Shen
Dacheng Tao
VLM
61
30
0
04 Jul 2022
Why are NLP Models Fumbling at Elementary Math? A Survey of Deep
  Learning based Word Problem Solvers
Why are NLP Models Fumbling at Elementary Math? A Survey of Deep Learning based Word Problem Solvers
Sowmya S. Sundaram
Sairam Gurajada
M. Fisichella
Deepak P
Savitha Sam Abraham
ReLM
26
11
0
31 May 2022
MiniDisc: Minimal Distillation Schedule for Language Model Compression
MiniDisc: Minimal Distillation Schedule for Language Model Compression
Chen Zhang
Yang Yang
Qifan Wang
Jiahao Liu
Jingang Wang
Wei Wu
Dawei Song
47
4
0
29 May 2022
Efficient Sub-structured Knowledge Distillation
Efficient Sub-structured Knowledge Distillation
Wenye Lin
Yangming Li
Lemao Liu
Shuming Shi
Haitao Zheng
12
1
0
09 Mar 2022
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language
  Models
Model Uncertainty-Aware Knowledge Amalgamation for Pre-Trained Language Models
Lei Li
Yankai Lin
Xuancheng Ren
Guangxiang Zhao
Peng Li
Jie Zhou
Xu Sun
MoMe
24
2
0
14 Dec 2021
Calibration of Pre-trained Transformers
Calibration of Pre-trained Transformers
Shrey Desai
Greg Durrett
UQLM
243
290
0
17 Mar 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
Dropout as a Bayesian Approximation: Representing Model Uncertainty in
  Deep Learning
Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning
Y. Gal
Zoubin Ghahramani
UQCV
BDL
285
9,145
0
06 Jun 2015
1