ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.05065
  4. Cited By
Self-Distillation as Instance-Specific Label Smoothing

Self-Distillation as Instance-Specific Label Smoothing

9 June 2020
Zhilu Zhang
M. Sabuncu
ArXivPDFHTML

Papers citing "Self-Distillation as Instance-Specific Label Smoothing"

17 / 17 papers shown
Title
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
From Large to Super-Tiny: End-to-End Optimization for Cost-Efficient LLMs
Jiliang Ni
Jiachen Pu
Zhongyi Yang
Kun Zhou
Hui Wang
Xiaoliang Xiao
Dakui Wang
Xin Li
Jingfeng Luo
Conggang Hu
37
0
0
18 Apr 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
Jingyuan Chen
Yuan Yao
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
49
1
0
28 Jan 2025
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model
The Effect of Optimal Self-Distillation in Noisy Gaussian Mixture Model
Kaito Takanami
Takashi Takahashi
Ayaka Sakata
40
0
0
27 Jan 2025
Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection
Distilling Aggregated Knowledge for Weakly-Supervised Video Anomaly Detection
Jash Dalvi
Ali Dabouei
Gunjan Dhanuka
Min Xu
23
0
0
05 Jun 2024
$\textit{Trans-LoRA}$: towards data-free Transferable Parameter
  Efficient Finetuning
Trans-LoRA\textit{Trans-LoRA}Trans-LoRA: towards data-free Transferable Parameter Efficient Finetuning
Runqian Wang
Soumya Ghosh
David D. Cox
Diego Antognini
Aude Oliva
Rogerio Feris
Leonid Karlinsky
37
1
0
27 May 2024
An Empirical Investigation into the Effect of Parameter Choices in
  Knowledge Distillation
An Empirical Investigation into the Effect of Parameter Choices in Knowledge Distillation
Md Arafat Sultan
Aashka Trivedi
Parul Awasthy
Avirup Sil
32
0
0
12 Jan 2024
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Revisiting Token Dropping Strategy in Efficient BERT Pretraining
Qihuang Zhong
Liang Ding
Juhua Liu
Xuebo Liu
Min Zhang
Bo Du
Dacheng Tao
VLM
34
9
0
24 May 2023
Generalization Matters: Loss Minima Flattening via Parameter
  Hybridization for Efficient Online Knowledge Distillation
Generalization Matters: Loss Minima Flattening via Parameter Hybridization for Efficient Online Knowledge Distillation
Tianli Zhang
Mengqi Xue
Jiangtao Zhang
Haofei Zhang
Yu Wang
Lechao Cheng
Jie Song
Mingli Song
28
5
0
26 Mar 2023
Boosting Graph Neural Networks via Adaptive Knowledge Distillation
Boosting Graph Neural Networks via Adaptive Knowledge Distillation
Zhichun Guo
Chunhui Zhang
Yujie Fan
Yijun Tian
Chuxu Zhang
Nitesh V. Chawla
21
32
0
12 Oct 2022
Spot-adaptive Knowledge Distillation
Spot-adaptive Knowledge Distillation
Jie Song
Ying Chen
Jingwen Ye
Mingli Song
20
72
0
05 May 2022
Better Supervisory Signals by Observing Learning Paths
Better Supervisory Signals by Observing Learning Paths
Yi Ren
Shangmin Guo
Danica J. Sutherland
33
21
0
04 Mar 2022
Dynamic Rectification Knowledge Distillation
Dynamic Rectification Knowledge Distillation
Fahad Rahman Amik
Ahnaf Ismat Tasin
Silvia Ahmed
M. M. L. Elahi
Nabeel Mohammed
26
5
0
27 Jan 2022
Learning with Label Noise for Image Retrieval by Selecting Interactions
Learning with Label Noise for Image Retrieval by Selecting Interactions
Sarah Ibrahimi
Arnaud Sors
Rafael Sampaio de Rezende
S. Clinchant
NoLa
VLM
24
16
0
20 Dec 2021
Federated Few-Shot Learning with Adversarial Learning
Federated Few-Shot Learning with Adversarial Learning
Chenyou Fan
Jianwei Huang
FedML
13
29
0
01 Apr 2021
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
19
2,843
0
09 Jun 2020
Knowledge Distillation by On-the-Fly Native Ensemble
Knowledge Distillation by On-the-Fly Native Ensemble
Xu Lan
Xiatian Zhu
S. Gong
197
473
0
12 Jun 2018
Mean teachers are better role models: Weight-averaged consistency
  targets improve semi-supervised deep learning results
Mean teachers are better role models: Weight-averaged consistency targets improve semi-supervised deep learning results
Antti Tarvainen
Harri Valpola
OOD
MoMe
264
1,275
0
06 Mar 2017
1