ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04570
  4. Cited By
BERT Learns to Teach: Knowledge Distillation with Meta Learning

BERT Learns to Teach: Knowledge Distillation with Meta Learning

8 June 2021
Wangchunshu Zhou
Canwen Xu
Julian McAuley
ArXivPDFHTML

Papers citing "BERT Learns to Teach: Knowledge Distillation with Meta Learning"

20 / 20 papers shown
Title
Knowledge Distillation with Adapted Weight
Sirong Wu
Xi Luo
Junjie Liu
Yuhui Deng
40
0
0
06 Jan 2025
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Multi-Level Optimal Transport for Universal Cross-Tokenizer Knowledge Distillation on Language Models
Xiao Cui
Mo Zhu
Yulei Qin
Liang Xie
Wengang Zhou
Yiming Li
83
4
0
19 Dec 2024
Teacher-Student Architecture for Knowledge Distillation: A Survey
Teacher-Student Architecture for Knowledge Distillation: A Survey
Chengming Hu
Xuan Li
Danyang Liu
Haolun Wu
Xi Chen
Ju Wang
Xue Liu
21
16
0
08 Aug 2023
Meta-Tsallis-Entropy Minimization: A New Self-Training Approach for
  Domain Adaptation on Text Classification
Meta-Tsallis-Entropy Minimization: A New Self-Training Approach for Domain Adaptation on Text Classification
Menglong Lu
Zhen Huang
Zhiliang Tian
Yunxiang Zhao
Xuanyu Fei
Dongsheng Li
OOD
34
5
0
04 Aug 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Tailoring Instructions to Student's Learning Levels Boosts Knowledge
  Distillation
Tailoring Instructions to Student's Learning Levels Boosts Knowledge Distillation
Yuxin Ren
Zi-Qi Zhong
Xingjian Shi
Yi Zhu
Chun Yuan
Mu Li
24
7
0
16 May 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained
  Transformers
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
Dialogue State Distillation Network with Inter-slot Contrastive Learning
  for Dialogue State Tracking
Dialogue State Distillation Network with Inter-slot Contrastive Learning for Dialogue State Tracking
Jing Xu
Dandan Song
Chong Liu
Siu Cheung Hui
Fei Li
Qiang Ju
Xiaonan He
Jian Xie
24
5
0
16 Feb 2023
In-context Learning Distillation: Transferring Few-shot Learning Ability
  of Pre-trained Language Models
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models
Yukun Huang
Yanda Chen
Zhou Yu
Kathleen McKeown
27
30
0
20 Dec 2022
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models
  of Different Modalities
TencentPretrain: A Scalable and Flexible Toolkit for Pre-training Models of Different Modalities
Zhe Zhao
Yudong Li
Cheng-An Hou
Jing-xin Zhao
Rong Tian
...
Xingwu Sun
Zhanhui Kang
Xiaoyong Du
Linlin Shen
Kimmo Yan
VLM
41
23
0
13 Dec 2022
Teacher-Student Architecture for Knowledge Learning: A Survey
Teacher-Student Architecture for Knowledge Learning: A Survey
Chengming Hu
Xuan Li
Dan Liu
Xi Chen
Ju Wang
Xue Liu
20
35
0
28 Oct 2022
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Multi-CLS BERT: An Efficient Alternative to Traditional Ensembling
Haw-Shiuan Chang
Ruei-Yao Sun
Kathryn Ricci
Andrew McCallum
43
14
0
10 Oct 2022
PROD: Progressive Distillation for Dense Retrieval
PROD: Progressive Distillation for Dense Retrieval
Zhenghao Lin
Yeyun Gong
Xiao Liu
Hang Zhang
Chen Lin
...
Jian Jiao
Jing Lu
Daxin Jiang
Rangan Majumder
Nan Duan
51
27
0
27 Sep 2022
Parameter-Efficient and Student-Friendly Knowledge Distillation
Parameter-Efficient and Student-Friendly Knowledge Distillation
Jun Rao
Xv Meng
Liang Ding
Shuhan Qi
Dacheng Tao
37
46
0
28 May 2022
A Survey on Green Deep Learning
A Survey on Green Deep Learning
Jingjing Xu
Wangchunshu Zhou
Zhiyi Fu
Hao Zhou
Lei Li
VLM
73
83
0
08 Nov 2021
Learning Student-Friendly Teacher Networks for Knowledge Distillation
Learning Student-Friendly Teacher Networks for Knowledge Distillation
D. Park
Moonsu Cha
C. Jeong
Daesin Kim
Bohyung Han
121
101
0
12 Feb 2021
Meta Pseudo Labels
Meta Pseudo Labels
Hieu H. Pham
Zihang Dai
Qizhe Xie
Minh-Thang Luong
Quoc V. Le
VLM
262
656
0
23 Mar 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
221
197
0
07 Feb 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language
  Understanding
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
365
11,700
0
09 Mar 2017
1