Supervision Complexity and its Role in Knowledge Distillation

Supervision Complexity and its Role in Knowledge Distillation

28 January 2023

Hrayr Harutyunyan

Papers citing "Supervision Complexity and its Role in Knowledge Distillation"

13 / 13 papers shown

Title
Efficient Knowledge Distillation via Curriculum Extraction Shivam Gupta Sushrut Karmalkar 44 0 0 21 Mar 2025
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs A. S. Rawat Veeranjaneyulu Sadhanala Afshin Rostamizadeh Ayan Chakrabarti Wittawat Jitkrittum ... Rakesh Shivanna Sashank J. Reddi A. Menon Rohan Anil Sanjiv Kumar 33 2 0 24 Oct 2024
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws M. E. Ildiz Halil Alperen Gozeten Ege Onur Taga Marco Mondelli Samet Oymak 56 2 0 24 Oct 2024
The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information Diyuan Wu Ionut-Vlad Modoranu M. Safaryan Denis Kuznedelev Dan Alistarh 29 1 0 30 Aug 2024
Learning Neural Networks with Sparse Activations Pranjal Awasthi Nishanth Dikkala Pritish Kamath Raghu Meka 41 2 0 26 Jun 2024
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains Qingyue Zhao Banghua Zhu 36 4 0 11 Oct 2023
Data Upcycling Knowledge Distillation for Image Super-Resolution Yun-feng Zhang Wei Li Simiao Li Hanting Chen Zhaopeng Tu Wenjun Wang Bingyi Jing Hai-lin Wang Jie Hu 35 3 0 25 Sep 2023
Cluster-aware Semi-supervised Learning: Relational Knowledge Distillation Provably Learns Clustering Yijun Dong Kevin Miller Qiuyu Lei Rachel A. Ward 25 4 0 20 Jul 2023
On student-teacher deviations in distillation: does it pay to disobey? Vaishnavh Nagarajan A. Menon Srinadh Bhojanapalli H. Mobahi Surinder Kumar 43 9 0 30 Jan 2023
Pro-KD: Progressive Distillation by Following the Footsteps of the Teacher Mehdi Rezagholizadeh A. Jafari Puneeth Salad Pranav Sharma Ali Saheb Pasand A. Ghodsi 79 17 0 16 Oct 2021
A linearized framework and a new benchmark for model selection for fine-tuning Aditya Deshpande Alessandro Achille Avinash Ravichandran Hao Li L. Zancato Charless C. Fowlkes Rahul Bhotika Stefano Soatto Pietro Perona ALM 115 46 0 29 Jan 2021
Knowledge Distillation in Wide Neural Networks: Risk Bound, Data Efficiency and Imperfect Teacher Guangda Ji Zhanxing Zhu 59 42 0 20 Oct 2020
Large scale distributed neural network training through online distillation Rohan Anil Gabriel Pereyra Alexandre Passos Róbert Ormándi George E. Dahl Geoffrey E. Hinton FedML 278 404 0 09 Apr 2018