Annealing Knowledge Distillation

14 April 2021

Papers citing "Annealing Knowledge Distillation"

25 / 25 papers shown

Title
Revisiting the Relationship between Adversarial and Clean Training: Why Clean Training Can Make Adversarial Training Better MingWei Zhou Xiaobing Pei AAML 203 0 0 30 Mar 2025
Joint Fine-tuning and Conversion of Pretrained Speech and Language Models towards Linear Complexity Mutian He Philip N. Garner 82 0 0 09 Oct 2024
Topological Persistence Guided Knowledge Distillation for Wearable Sensor Data Eun Som Jeon Hongjun Choi A. Shukla Yuan Wang Hyunglae Lee M. Buman Pavan Turaga 35 3 0 07 Jul 2024
AdaKD: Dynamic Knowledge Distillation of ASR models using Adaptive Loss Weighting Shreyan Ganguly Roshan Nayak Rakshith Rao Ujan Deb AP Prathosh 32 1 0 11 May 2024
GKD: A General Knowledge Distillation Framework for Large-scale Pre-trained Language Model Shicheng Tan Weng Lam Tam Yuanchun Wang Wenwen Gong Yang Yang ... Jiahao Liu Jingang Wang Shuo Zhao Peng Zhang Jie Tang ALM MoE 33 11 0 11 Jun 2023
Neural Architecture Search for Effective Teacher-Student Knowledge Transfer in Language Models Aashka Trivedi Takuma Udagawa Michele Merler Yikang Shen Yousef El-Kurdi Bishwaranjan Bhattacharjee 35 7 0 16 Mar 2023
Revisiting Intermediate Layer Distillation for Compressing Language Models: An Overfitting Perspective Jongwoo Ko Seungjoon Park Minchan Jeong S. Hong Euijai Ahn Duhyeuk Chang Se-Young Yun 23 6 0 03 Feb 2023
Improved Knowledge Distillation for Pre-trained Language Models via Knowledge Selection Chenglong Wang Yi Lu Yongyu Mu Yimin Hu Tong Xiao Jingbo Zhu 34 8 0 01 Feb 2023
CAMeMBERT: Cascading Assistant-Mediated Multilingual BERT Dan DeGenaro Jugal Kalita 35 0 0 22 Dec 2022
In-context Learning Distillation: Transferring Few-shot Learning Ability of Pre-trained Language Models Yukun Huang Yanda Chen Zhou Yu Kathleen McKeown 27 30 0 20 Dec 2022
PSAQ-ViT V2: Towards Accurate and General Data-Free Quantization for Vision Transformers Zhikai Li Mengjuan Chen Junrui Xiao Qingyi Gu ViT MQ 50 33 0 13 Sep 2022
CILDA: Contrastive Data Augmentation using Intermediate Layer Knowledge Distillation Md. Akmal Haidar Mehdi Rezagholizadeh Abbas Ghaddar Khalil Bibi Philippe Langlais Pascal Poupart CLL 35 6 0 15 Apr 2022
ERNIE 3.0 Titan: Exploring Larger-scale Knowledge Enhanced Pre-training for Language Understanding and Generation Shuohuan Wang Yu Sun Yang Xiang Zhihua Wu Siyu Ding ... Tian Wu Wei Zeng Ge Li Wen Gao Haifeng Wang ELM 39 79 0 23 Dec 2021
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher Mehdi Rezagholizadeh A. Jafari Puneeth Salad Pranav Sharma Ali Saheb Pasand A. Ghodsi 81 18 0 16 Oct 2021
A Short Study on Compressing Decoder-Based Language Models Tianda Li Yassir El Mesbahi I. Kobyzev Ahmad Rashid A. Mahmud Nithin Anchuri Habib Hajimolahoseini Yang Liu Mehdi Rezagholizadeh 93 25 0 16 Oct 2021
Kronecker Decomposition for GPT Compression Ali Edalati Marzieh S. Tahaei Ahmad Rashid V. Nia J. Clark Mehdi Rezagholizadeh 36 33 0 15 Oct 2021
Language Modelling via Learning to Rank A. Frydenlund Gagandeep Singh Frank Rudzicz 47 7 0 13 Oct 2021
RAIL-KD: RAndom Intermediate Layer Mapping for Knowledge Distillation Md. Akmal Haidar Nithin Anchuri Mehdi Rezagholizadeh Abbas Ghaddar Philippe Langlais Pascal Poupart 31 22 0 21 Sep 2021
Knowledge Distillation with Noisy Labels for Natural Language Understanding Shivendra Bhardwaj Abbas Ghaddar Ahmad Rashid Khalil Bibi Cheng-huan Li A. Ghodsi Philippe Langlais Mehdi Rezagholizadeh 19 1 0 21 Sep 2021
iRNN: Integer-only Recurrent Neural Network Eyyub Sari Vanessa Courville V. Nia MQ 56 4 0 20 Sep 2021
How to Select One Among All? An Extensive Empirical Study Towards the Robustness of Knowledge Distillation in Natural Language Understanding Tianda Li Ahmad Rashid A. Jafari Pranav Sharma A. Ghodsi Mehdi Rezagholizadeh AAML 33 5 0 13 Sep 2021
Learning to Teach with Student Feedback Yitao Liu Tianxiang Sun Xipeng Qiu Xuanjing Huang VLM 23 6 0 10 Sep 2021
Towards Zero-Shot Knowledge Distillation for Natural Language Processing Ahmad Rashid Vasileios Lioutas Abbas Ghaddar Mehdi Rezagholizadeh 21 27 0 31 Dec 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 299 6,984 0 20 Apr 2018
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 950 20,599 0 17 Apr 2017