Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2012.15828
Cited By
MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers
31 December 2020
Wenhui Wang
Hangbo Bao
Shaohan Huang
Li Dong
Furu Wei
MQ
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MiniLMv2: Multi-Head Self-Attention Relation Distillation for Compressing Pretrained Transformers"
50 / 53 papers shown
Title
ABKD: Pursuing a Proper Allocation of the Probability Mass in Knowledge Distillation via
α
α
α
-
β
β
β
-Divergence
Guanghui Wang
Zhiyong Yang
Zihan Wang
Shi Wang
Qianqian Xu
Qingming Huang
42
0
0
07 May 2025
KETCHUP: K-Step Return Estimation for Sequential Knowledge Distillation
Jiabin Fan
Guoqing Luo
Michael Bowling
Lili Mou
OffRL
68
0
0
26 Apr 2025
BadMoE: Backdooring Mixture-of-Experts LLMs via Optimizing Routing Triggers and Infecting Dormant Experts
Qingyue Wang
Qi Pang
Xixun Lin
Shuai Wang
Daoyuan Wu
MoE
62
0
0
24 Apr 2025
Exploring and Controlling Diversity in LLM-Agent Conversation
Kuanchao Chu
Yi-Pei Chen
Hideki Nakayama
LLMAG
50
1
0
24 Feb 2025
Learning to Sample Effective and Diverse Prompts for Text-to-Image Generation
Taeyoung Yun
Dinghuai Zhang
Jinkyoo Park
Ling Pan
DiffM
84
2
0
17 Feb 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
75
15
0
28 Jan 2025
MAS-Attention: Memory-Aware Stream Processing for Attention Acceleration on Resource-Constrained Edge Devices
Mohammadali Shakerdargah
Shan Lu
Chao Gao
Di Niu
75
0
0
20 Nov 2024
Just KIDDIN: Knowledge Infusion and Distillation for Detection of INdecent Memes
Rahul Garg
Trilok Padhi
Hemang Jain
Ugur Kursuncu
Ponnurangam Kumaraguru
77
3
0
19 Nov 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
70
5
0
22 Oct 2024
Self-Data Distillation for Recovering Quality in Pruned Large Language Models
Vithursan Thangarasa
Ganesh Venkatesh
Mike Lasby
Nish Sinnadurai
Sean Lie
SyDa
38
1
0
13 Oct 2024
Mentor-KD: Making Small Language Models Better Multi-step Reasoners
Hojae Lee
Junho Kim
SangKeun Lee
LRM
34
1
0
11 Oct 2024
NV-Retriever: Improving text embedding models with effective hard-negative mining
Gabriel de Souza P. Moreira
Radek Osmulski
Mengyao Xu
Ronay Ak
Benedikt Schifferer
Even Oldridge
RALM
49
31
0
22 Jul 2024
Compact Language Models via Pruning and Knowledge Distillation
Saurav Muralidharan
Sharath Turuvekere Sreenivas
Raviraj Joshi
Marcin Chochowski
M. Patwary
M. Shoeybi
Bryan Catanzaro
Jan Kautz
Pavlo Molchanov
SyDa
MQ
42
38
0
19 Jul 2024
The Scandinavian Embedding Benchmarks: Comprehensive Assessment of Multilingual and Monolingual Text Embedding
Kenneth C. Enevoldsen
Márton Kardos
Niklas Muennighoff
Kristoffer Laigaard Nielbo
42
9
0
04 Jun 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
63
12
0
28 May 2024
Efficient Models for the Detection of Hate, Abuse and Profanity
Christoph Tillmann
Aashka Trivedi
Bishwaranjan Bhattacharjee
VLM
16
0
0
08 Feb 2024
Knowledge Distillation from Non-streaming to Streaming ASR Encoder using Auxiliary Non-streaming Layer
Kyuhong Shim
Jinkyu Lee
Simyoung Chang
Kyuwoong Hwang
40
2
0
31 Aug 2023
NoRefER: a Referenceless Quality Metric for Automatic Speech Recognition via Semi-Supervised Language Model Fine-Tuning with Contrastive Learning
K. Yuksel
Thiago Castro Ferreira
Golara Javadi
Mohamed El-Badrashiny
Ahmet Gunduz
26
4
0
21 Jun 2023
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model
Shicheng Tan
Weng Lam Tam
Yuanchun Wang
Wenwen Gong
Yang Yang
...
Jiahao Liu
Jingang Wang
Shuo Zhao
Peng-Zhen Zhang
Jie Tang
ALM
MoE
33
11
0
11 Jun 2023
Just CHOP: Embarrassingly Simple LLM Compression
A. Jha
Tom Sherborne
Evan Pete Walsh
Dirk Groeneveld
Emma Strubell
Iz Beltagy
30
3
0
24 May 2023
Lifting the Curse of Capacity Gap in Distilling Language Models
Chen Zhang
Yang Yang
Jiahao Liu
Jingang Wang
Yunsen Xian
Benyou Wang
Dawei Song
MoE
32
19
0
20 May 2023
A Systematic Study of Knowledge Distillation for Natural Language Generation with Pseudo-Target Training
Nitay Calderon
Subhabrata Mukherjee
Roi Reichart
Amir Kantor
41
17
0
03 May 2023
RIFormer: Keep Your Vision Backbone Effective While Removing Token Mixer
Jiahao Wang
Songyang Zhang
Yong Liu
Taiqiang Wu
Yujiu Yang
Xihui Liu
Kai-xiang Chen
Ping Luo
Dahua Lin
36
20
0
12 Apr 2023
FrenchMedMCQA: A French Multiple-Choice Question Answering Dataset for Medical domain
Yanis Labrak
Adrien Bazoge
Richard Dufour
Mickael Rouvier
Emmanuel Morin
B. Daille
P. Gourraud
19
30
0
09 Apr 2023
The Semantic Reader Project: Augmenting Scholarly Documents through AI-Powered Interactive Reading Interfaces
Kyle Lo
Joseph Chee Chang
Andrew Head
Jonathan Bragg
Amy X. Zhang
...
Caroline M Wu
Jiangjiang Yang
Angele Zamarron
Marti A. Hearst
Daniel S. Weld
32
19
0
25 Mar 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
30
7
0
16 Mar 2023
Smooth and Stepwise Self-Distillation for Object Detection
Jieren Deng
Xiaoxia Zhou
Hao Tian
Zhihong Pan
Derek Aguiar
ObjD
26
0
0
09 Mar 2023
UDAPDR: Unsupervised Domain Adaptation via LLM Prompting and Distillation of Rerankers
Jon Saad-Falcon
Omar Khattab
Keshav Santhanam
Radu Florian
M. Franz
Salim Roukos
Avirup Sil
Md Arafat Sultan
Christopher Potts
24
41
0
01 Mar 2023
Generic-to-Specific Distillation of Masked Autoencoders
Wei Huang
Zhiliang Peng
Li Dong
Furu Wei
Jianbin Jiao
QiXiang Ye
32
22
0
28 Feb 2023
KS-DETR: Knowledge Sharing in Attention Learning for Detection Transformer
Kaikai Zhao
Norimichi Ukita
MU
43
1
0
22 Feb 2023
HomoDistil: Homotopic Task-Agnostic Distillation of Pre-trained Transformers
Chen Liang
Haoming Jiang
Zheng Li
Xianfeng Tang
Bin Yin
Tuo Zhao
VLM
27
24
0
19 Feb 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
23
6
0
03 Feb 2023
idT5: Indonesian Version of Multilingual T5 Transformer
Mukhlish Fuadi
A. Wibawa
S. Sumpeno
19
6
0
02 Feb 2023
ConvLab-3: A Flexible Dialogue System Toolkit Based on a Unified Data Format
Qi Zhu
Christian Geishauser
Hsien-Chin Lin
Carel van Niekerk
Baolin Peng
...
Dazhen Wan
Xiaochen Zhu
Jianfeng Gao
Milica Gavsić
Minlie Huang
56
23
0
30 Nov 2022
BudgetLongformer: Can we Cheaply Pretrain a SotA Legal Language Model From Scratch?
Joel Niklaus
Daniele Giofré
33
11
0
30 Nov 2022
QuaLA-MiniLM: a Quantized Length Adaptive MiniLM
Shira Guskin
Moshe Wasserblat
Chang Wang
Haihao Shen
MQ
16
2
0
31 Oct 2022
Real-time Speech Interruption Analysis: From Cloud to Client Deployment
Quchen Fu
Szu-Wei Fu
Yaran Fan
Yu-Huan Wu
Zhuo Chen
J. Gupchup
Ross Cutler
34
0
0
24 Oct 2022
Multi-stage Distillation Framework for Cross-Lingual Semantic Similarity Matching
Kunbo Ding
Weijie Liu
Yuejian Fang
Zhe Zhao
Qi Ju
Xuefeng Yang
23
1
0
13 Sep 2022
Knowledge Distillation of Transformer-based Language Models Revisited
Chengqiang Lu
Jianwei Zhang
Yunfei Chu
Zhengyu Chen
Jingren Zhou
Fei Wu
Haiqing Chen
Hongxia Yang
VLM
27
10
0
29 Jun 2022
A Closer Look at Self-Supervised Lightweight Vision Transformers
Shaoru Wang
Jin Gao
Zeming Li
Jian Sun
Weiming Hu
ViT
67
41
0
28 May 2022
Scim: Intelligent Skimming Support for Scientific Papers
Raymond Fok
Hita Kambhamettu
Luca Soldaini
Jonathan Bragg
Kyle Lo
Andrew Head
Marti A. Hearst
Daniel S. Weld
18
36
0
09 May 2022
MoEBERT: from BERT to Mixture-of-Experts via Importance-Guided Adaptation
Simiao Zuo
Qingru Zhang
Chen Liang
Pengcheng He
T. Zhao
Weizhu Chen
MoE
24
38
0
15 Apr 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
33
6
0
15 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
28
123
0
14 Apr 2022
Paying More Attention to Self-attention: Improving Pre-trained Language Models via Attention Guiding
Shanshan Wang
Zhumin Chen
Z. Ren
Huasheng Liang
Qiang Yan
Pengjie Ren
33
9
0
06 Apr 2022
FairLex: A Multilingual Benchmark for Evaluating Fairness in Legal Text Processing
Ilias Chalkidis
Tommaso Pasini
Shenmin Zhang
Letizia Tomada
Sebastian Felix Schwemer
Anders Søgaard
AILaw
40
54
0
14 Mar 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation
Shuohuan Wang
Yu Sun
Yang Xiang
Zhihua Wu
Siyu Ding
...
Tian Wu
Wei Zeng
Ge Li
Wen Gao
Haifeng Wang
ELM
39
79
0
23 Dec 2021
DeBERTaV3: Improving DeBERTa using ELECTRA-Style Pre-Training with Gradient-Disentangled Embedding Sharing
Pengcheng He
Jianfeng Gao
Weizhu Chen
30
1,120
0
18 Nov 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
31
22
0
21 Sep 2021
Finetuning Pretrained Transformers into RNNs
Jungo Kasai
Hao Peng
Yizhe Zhang
Dani Yogatama
Gabriel Ilharco
Nikolaos Pappas
Yi Mao
Weizhu Chen
Noah A. Smith
36
63
0
24 Mar 2021
1
2
Next