Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.03034
Cited By
Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers
6 October 2020
Yimeng Wu
Peyman Passban
Mehdi Rezagholizade
Qun Liu
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers"
26 / 26 papers shown
Title
Applications of Knowledge Distillation in Remote Sensing: A Survey
Yassine Himeur
N. Aburaed
O. Elharrouss
Iraklis Varlamis
Shadi Atalla
W. Mansoor
Hussain Al Ahmad
49
4
0
18 Sep 2024
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study
Aniruddha Roy
Pretam Ray
Ayush Maheshwari
Sudeshna Sarkar
Pawan Goyal
34
1
0
09 Jul 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
41
2
0
03 Mar 2024
A Comprehensive Survey of Compression Algorithms for Language Models
Seungcheol Park
Jaehyeon Choi
Sojin Lee
U. Kang
MQ
32
12
0
27 Jan 2024
What is Lost in Knowledge Distillation?
Manas Mohanty
Tanya Roosta
Peyman Passban
18
1
0
07 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models
Takuma Udagawa
Aashka Trivedi
Michele Merler
Bishwaranjan Bhattacharjee
47
7
0
13 Oct 2023
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling
Ziming Wang
Shumin Han
Xiaodi Wang
Jing Hao
Xianbin Cao
Baochang Zhang
VLM
32
0
0
18 Sep 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives
Xinpeng Wang
Leonie Weissweiler
Hinrich Schütze
Barbara Plank
28
8
0
24 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation
Songming Zhang
Yunlong Liang
Shuaibo Wang
Wenjuan Han
Jian Liu
Jinan Xu
Jinan Xu
23
8
0
14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models
Aashka Trivedi
Takuma Udagawa
Michele Merler
Yikang Shen
Yousef El-Kurdi
Bishwaranjan Bhattacharjee
35
7
0
16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective
Jongwoo Ko
Seungjoon Park
Minchan Jeong
S. Hong
Euijai Ahn
Duhyeuk Chang
Se-Young Yun
23
6
0
03 Feb 2023
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages
Alireza Mohammadshahi
Vassilina Nikoulina
Alexandre Berard
Caroline Brun
James Henderson
Laurent Besacier
VLM
MoE
LRM
29
20
0
20 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
32
164
0
14 Oct 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models
Takanori Ashihara
Takafumi Moriya
Kohei Matsuura
Tomohiro Tanaka
30
28
0
14 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language Models?
I. Kobyzev
A. Jafari
Mehdi Rezagholizadeh
Tianda Li
Alan Do-Omri
Peng Lu
Pascal Poupart
A. Ghodsi
30
2
0
25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation
Md. Akmal Haidar
Mehdi Rezagholizadeh
Abbas Ghaddar
Khalil Bibi
Philippe Langlais
Pascal Poupart
CLL
35
6
0
15 Apr 2022
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher
Mehdi Rezagholizadeh
A. Jafari
Puneeth Salad
Pranav Sharma
Ali Saheb Pasand
A. Ghodsi
81
18
0
16 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation
Md. Akmal Haidar
Nithin Anchuri
Mehdi Rezagholizadeh
Abbas Ghaddar
Philippe Langlais
Pascal Poupart
31
22
0
21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding
Shivendra Bhardwaj
Abbas Ghaddar
Ahmad Rashid
Khalil Bibi
Cheng-huan Li
A. Ghodsi
Philippe Langlais
Mehdi Rezagholizadeh
19
1
0
21 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding
Tianda Li
Ahmad Rashid
A. Jafari
Pranav Sharma
A. Ghodsi
Mehdi Rezagholizadeh
AAML
33
5
0
13 Sep 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation
Yuanxin Liu
Fandong Meng
Zheng Lin
Weiping Wang
Jie Zhou
19
6
0
10 Jun 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax
Ehsan Kamalloo
Mehdi Rezagholizadeh
Peyman Passban
Ali Ghodsi
AAML
20
17
0
28 May 2021
Selective Knowledge Distillation for Neural Machine Translation
Fusheng Wang
Jianhao Yan
Fandong Meng
Jie Zhou
11
58
0
27 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing
Ahmad Rashid
Vasileios Lioutas
Abbas Ghaddar
Mehdi Rezagholizadeh
21
27
0
31 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation
Peyman Passban
Yimeng Wu
Mehdi Rezagholizadeh
Qun Liu
21
122
0
27 Dec 2020
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
273
1,896
0
10 Jan 2017
1