Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers

6 October 2020

Qun Liu

Papers citing "Why Skip If You Can Combine: A Simple Knowledge Distillation Technique for Intermediate Layers"

26 / 26 papers shown

Title
Applications of Knowledge Distillation in Remote Sensing: A Survey Yassine Himeur N. Aburaed O. Elharrouss Iraklis Varlamis Shadi Atalla W. Mansoor Hussain Al Ahmad 49 4 0 18 Sep 2024
Enhancing Low-Resource NMT with a Multilingual Encoder and Knowledge Distillation: A Case Study Aniruddha Roy Pretam Ray Ayush Maheshwari Sudeshna Sarkar Pawan Goyal 34 1 0 09 Jul 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation Heegon Jin Seonil Son Jemin Park Youngseok Kim Hyungjong Noh Yeonsoo Lee 41 2 0 03 Mar 2024
A Comprehensive Survey of Compression Algorithms for Language Models Seungcheol Park Jaehyeon Choi Sojin Lee U. Kang MQ 32 12 0 27 Jan 2024
What is Lost in Knowledge Distillation? Manas Mohanty Tanya Roosta Peyman Passban 18 1 0 07 Nov 2023
A Comparative Analysis of Task-Agnostic Distillation Methods for Compressing Transformer Language Models Takuma Udagawa Aashka Trivedi Michele Merler Bishwaranjan Bhattacharjee 47 7 0 13 Oct 2023
Heterogeneous Generative Knowledge Distillation with Masked Image Modeling Ziming Wang Shumin Han Xiaodi Wang Jing Hao Xianbin Cao Baochang Zhang VLM 32 0 0 18 Sep 2023
How to Distill your BERT: An Empirical Study on the Impact of Weight Initialisation and Distillation Objectives Xinpeng Wang Leonie Weissweiler Hinrich Schütze Barbara Plank 28 8 0 24 May 2023
Towards Understanding and Improving Knowledge Distillation for Neural Machine Translation Songming Zhang Yunlong Liang Shuaibo Wang Wenjuan Han Jian Liu Jinan Xu Jinan Xu 23 8 0 14 May 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models Aashka Trivedi Takuma Udagawa Michele Merler Yikang Shen Yousef El-Kurdi Bishwaranjan Bhattacharjee 35 7 0 16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective Jongwoo Ko Seungjoon Park Minchan Jeong S. Hong Euijai Ahn Duhyeuk Chang Se-Young Yun 23 6 0 03 Feb 2023
SMaLL-100: Introducing Shallow Multilingual Machine Translation Model for Low-Resource Languages Alireza Mohammadshahi Vassilina Nikoulina Alexandre Berard Caroline Brun James Henderson Laurent Besacier VLM MoE LRM 29 20 0 20 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation Mojtaba Valipour Mehdi Rezagholizadeh I. Kobyzev A. Ghodsi 32 164 0 14 Oct 2022
Deep versus Wide: An Analysis of Student Architectures for Task-Agnostic Knowledge Distillation of Self-Supervised Speech Models Takanori Ashihara Takafumi Moriya Kohei Matsuura Tomohiro Tanaka 30 28 0 14 Jul 2022
Do we need Label Regularization to Fine-tune Pre-trained Language Models? I. Kobyzev A. Jafari Mehdi Rezagholizadeh Tianda Li Alan Do-Omri Peng Lu Pascal Poupart A. Ghodsi 30 2 0 25 May 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation Md. Akmal Haidar Mehdi Rezagholizadeh Abbas Ghaddar Khalil Bibi Philippe Langlais Pascal Poupart CLL 35 6 0 15 Apr 2022
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher Mehdi Rezagholizadeh A. Jafari Puneeth Salad Pranav Sharma Ali Saheb Pasand A. Ghodsi 81 18 0 16 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation Md. Akmal Haidar Nithin Anchuri Mehdi Rezagholizadeh Abbas Ghaddar Philippe Langlais Pascal Poupart 31 22 0 21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding Shivendra Bhardwaj Abbas Ghaddar Ahmad Rashid Khalil Bibi Cheng-huan Li A. Ghodsi Philippe Langlais Mehdi Rezagholizadeh 19 1 0 21 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding Tianda Li Ahmad Rashid A. Jafari Pranav Sharma A. Ghodsi Mehdi Rezagholizadeh AAML 33 5 0 13 Sep 2021
Marginal Utility Diminishes: Exploring the Minimum Knowledge for BERT Knowledge Distillation Yuanxin Liu Fandong Meng Zheng Lin Weiping Wang Jie Zhou 19 6 0 10 Jun 2021
Not Far Away, Not So Close: Sample Efficient Nearest Neighbour Data Augmentation via MiniMax Ehsan Kamalloo Mehdi Rezagholizadeh Peyman Passban Ali Ghodsi AAML 20 17 0 28 May 2021
Selective Knowledge Distillation for Neural Machine Translation Fusheng Wang Jianhao Yan Fandong Meng Jie Zhou 11 58 0 27 May 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing Ahmad Rashid Vasileios Lioutas Abbas Ghaddar Mehdi Rezagholizadeh 21 27 0 31 Dec 2020
ALP-KD: Attention-Based Layer Projection for Knowledge Distillation Peyman Passban Yimeng Wu Mehdi Rezagholizadeh Qun Liu 21 122 0 27 Dec 2020
OpenNMT: Open-Source Toolkit for Neural Machine Translation Guillaume Klein Yoon Kim Yuntian Deng Jean Senellart Alexander M. Rush 273 1,896 0 10 Jan 2017