Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.06133
Cited By
BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance
13 October 2020
Jianquan Li
Xiaokang Liu
Honghong Zhao
Ruifeng Xu
Min Yang
Yaohong Jin
Re-assign community
ArXiv
PDF
HTML
Papers citing
"BERT-EMD: Many-to-Many Layer Mapping for BERT Compression with Earth Mover's Distance"
13 / 13 papers shown
Title
f-Divergence Minimization for Sequence-Level Knowledge Distillation
Yuqiao Wen
Zichao Li
Wenyu Du
Lili Mou
32
53
0
27 Jul 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
30
7
0
16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
23
6
0
03 Feb 2023
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
24
38
0
15 Apr 2022
Dynamic Knowledge Distillation for Pre-trained Language Models
Lei Li
Yankai Lin
Shuhuai Ren
Peng Li
Jie Zhou
Xu Sun
25
49
0
23 Sep 2021
From Discourse to Narrative: Knowledge Projection for Event Relation Extraction
Jialong Tang
Hongyu Lin
M. Liao
Yaojie Lu
Xianpei Han
Le Sun
Weijian Xie
Jin Xu
30
23
0
16 Jun 2021
ERNIE-Tiny : A Progressive Distillation Framework for Pretrained Transformer Compression
Weiyue Su
Xuyi Chen
Shi Feng
Jiaxiang Liu
Weixin Liu
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
34
13
0
04 Jun 2021
ROSITA: Refined BERT cOmpreSsion with InTegrAted techniques
Yuanxin Liu
Zheng Lin
Fengcheng Yuan
VLM
MQ
10
18
0
21 Mar 2021
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
24
257
0
31 Dec 2020
BERT-of-Theseus: Compressing BERT by Progressive Module Replacing
Canwen Xu
Wangchunshu Zhou
Tao Ge
Furu Wei
Ming Zhou
229
197
0
07 Feb 2020
Q-BERT: Hessian Based Ultra Low Precision Quantization of BERT
Sheng Shen
Zhen Dong
Jiayu Ye
Linjian Ma
Z. Yao
A. Gholami
Michael W. Mahoney
Kurt Keutzer
MQ
236
576
0
12 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
299
6,984
0
20 Apr 2018
1