Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2002.05715
Cited By
Self-Distillation Amplifies Regularization in Hilbert Space
13 February 2020
H. Mobahi
Mehrdad Farajtabar
Peter L. Bartlett
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Distillation Amplifies Regularization in Hilbert Space"
50 / 149 papers shown
Title
Learning from Stochastic Teacher Representations Using Student-Guided Knowledge Distillation
Muhammad Haseeb Aslam
Clara Martinez
M. Pedersoli
Alessandro Lameiras Koerich
Ali Etemad
Eric Granger
29
0
0
19 Apr 2025
CR-CTC: Consistency regularization on CTC for improved speech recognition
Zengwei Yao
Wei Kang
Xiaoyu Yang
Fangjun Kuang
Liyong Guo
Han Zhu
Zengrui Jin
Zhaoqing Li
Long Lin
Daniel Povey
56
0
0
17 Feb 2025
The Best Instruction-Tuning Data are Those That Fit
Dylan Zhang
Qirun Dai
Hao Peng
ALM
117
4
0
06 Feb 2025
sDREAMER: Self-distilled Mixture-of-Modality-Experts Transformer for Automatic Sleep Staging
Jingyuan Chen
Yuan Yao
Mie Anderson
Natalie Hauglund
Celia Kjaerby
Verena Untiet
Maiken Nedergaard
Jiebo Luo
49
1
0
28 Jan 2025
TAID: Temporally Adaptive Interpolated Distillation for Efficient Knowledge Transfer in Language Models
Makoto Shing
Kou Misaki
Han Bao
Sho Yokoi
Takuya Akiba
VLM
57
1
0
28 Jan 2025
Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision
Xiangzhong Luo
Di Liu
Hao Kong
Shuo Huai
Hui Chen
Guochu Xiong
Weichen Liu
36
2
0
03 Nov 2024
Universality of the
π
2
/
6
π^2/6
π
2
/6
Pathway in Avoiding Model Collapse
Apratim Dey
D. Donoho
58
5
0
30 Oct 2024
Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence
Shaozhen Shi
Yevgen Matusevych
Malvina Nissim
41
0
0
29 Oct 2024
A Little Help Goes a Long Way: Efficient LLM Training by Leveraging Small LMs
A. S. Rawat
Veeranjaneyulu Sadhanala
Afshin Rostamizadeh
Ayan Chakrabarti
Wittawat Jitkrittum
...
Rakesh Shivanna
Sashank J. Reddi
A. Menon
Rohan Anil
Sanjiv Kumar
33
2
0
24 Oct 2024
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. E. Ildiz
Halil Alperen Gozeten
Ege Onur Taga
Marco Mondelli
Samet Oymak
59
2
0
24 Oct 2024
Collapse or Thrive? Perils and Promises of Synthetic Data in a Self-Generating World
Joshua Kazdan
Rylan Schaeffer
Apratim Dey
Matthias Gerstgrasser
Rafael Rafailov
D. Donoho
Sanmi Koyejo
53
11
0
22 Oct 2024
Provable Weak-to-Strong Generalization via Benign Overfitting
David X. Wu
A. Sahai
73
6
0
06 Oct 2024
Practical token pruning for foundation models in few-shot conversational virtual assistant systems
Haode Qi
Cheng Qian
Jian Ni
Pratyush Singh
Reza Fazeli
Gengyu Wang
Zhongzheng Shu
Eric Wayne
Juergen Bross
30
0
0
21 Aug 2024
How to Train the Teacher Model for Effective Knowledge Distillation
Shayan Mohajer Hamidi
Xizhen Deng
Renhao Tan
Linfeng Ye
Ahmed H. Salamah
40
3
0
25 Jul 2024
Understanding the Gains from Repeated Self-Distillation
Divyansh Pareek
Simon S. Du
Sewoong Oh
37
3
0
05 Jul 2024
InFiConD: Interactive No-code Fine-tuning with Concept-based Knowledge Distillation
Jinbin Huang
Wenbin He
Liang Gou
Liu Ren
Chris Bryan
52
0
0
25 Jun 2024
Retraining with Predicted Hard Labels Provably Increases Model Accuracy
Rudrajit Das
Inderjit S Dhillon
Alessandro Epasto
Adel Javanmard
Jieming Mao
Vahab Mirrokni
Sujay Sanghavi
Peilin Zhong
52
1
0
17 Jun 2024
Beyond Model Collapse: Scaling Up with Synthesized Data Requires Reinforcement
Yunzhen Feng
Elvis Dohmatob
Pu Yang
Francois Charton
Julia Kempe
53
4
0
11 Jun 2024
Provable Contrastive Continual Learning
Yichen Wen
Zhiquan Tan
Kaipeng Zheng
Chuanlong Xie
Weiran Huang
CLL
34
4
0
29 May 2024
Trans-LoRA
\textit{Trans-LoRA}
Trans-LoRA
: towards data-free Transferable Parameter Efficient Finetuning
Runqian Wang
Soumya Ghosh
David D. Cox
Diego Antognini
Aude Oliva
Rogerio Feris
Leonid Karlinsky
40
1
0
27 May 2024
Quantifying the Gain in Weak-to-Strong Generalization
Moses Charikar
Chirag Pabbaraju
Kirankumar Shiragur
ELM
42
17
0
24 May 2024
Tailoring Vaccine Messaging with Common-Ground Opinions
Rickard Stureborg
Sanxing Chen
Ruoyu Xie
Aayushi Patel
Christopher Li
Chloe Qinyu Zhu
Tingnan Hu
Jun Yang
Bhuwan Dhingra
42
0
0
17 May 2024
Reliable Model Watermarking: Defending Against Theft without Compromising on Evasion
Markus Frey
Sichu Liang
Wentao Hu
Matthias Nau
Ju Jia
Shilin Wang
AAML
36
3
0
21 Apr 2024
Iterated Learning Improves Compositionality in Large Vision-Language Models
Chenhao Zheng
Jieyu Zhang
Aniruddha Kembhavi
Ranjay Krishna
VLM
CoGe
54
9
0
02 Apr 2024
Align-to-Distill: Trainable Attention Alignment for Knowledge Distillation in Neural Machine Translation
Heegon Jin
Seonil Son
Jemin Park
Youngseok Kim
Hyungjong Noh
Yeonsoo Lee
33
2
0
03 Mar 2024
DNA Family: Boosting Weight-Sharing NAS with Block-Wise Supervisions
Guangrun Wang
Changlin Li
Liuchun Yuan
Jiefeng Peng
Xiaoyu Xian
Xiaodan Liang
Xiaojun Chang
Liang Lin
46
1
0
02 Mar 2024
Layer-wise Regularized Dropout for Neural Language Models
Shiwen Ni
Min Yang
Ruifeng Xu
Chengming Li
Xiping Hu
44
0
0
26 Feb 2024
Model Collapse Demystified: The Case of Regression
Elvis Dohmatob
Yunzhen Feng
Julia Kempe
39
32
0
12 Feb 2024
A Tale of Tails: Model Collapse as a Change of Scaling Laws
Elvis Dohmatob
Yunzhen Feng
Pu Yang
Francois Charton
Julia Kempe
29
66
0
10 Feb 2024
Distillation Enhanced Time Series Forecasting Network with Momentum Contrastive Learning
Haozhi Gao
Qianqian Ren
Jinbao Li
AI4TS
29
2
0
31 Jan 2024
EPSD: Early Pruning with Self-Distillation for Efficient Model Compression
Dong Chen
Ning Liu
Yichen Zhu
Zhengping Che
Rui Ma
Fachao Zhang
Xiaofeng Mou
Yi Chang
Jian Tang
31
4
0
31 Jan 2024
Knowledge Distillation on Spatial-Temporal Graph Convolutional Network for Traffic Prediction
Mohammad Izadi
M. Safayani
Abdolreza Mirzaei
11
3
0
22 Jan 2024
Bayes Conditional Distribution Estimation for Knowledge Distillation Based on Conditional Mutual Information
Linfeng Ye
Shayan Mohajer Hamidi
Renhao Tan
En-Hui Yang
VLM
37
14
0
16 Jan 2024
Turning Dust into Gold: Distilling Complex Reasoning Capabilities from LLMs by Leveraging Negative Data
Yiwei Li
Peiwen Yuan
Shaoxiong Feng
Boyuan Pan
Bin Sun
Xinglin Wang
Heda Wang
Kan Li
LRM
35
20
0
20 Dec 2023
Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition
Ashish Seth
Sreyan Ghosh
S. Umesh
Dinesh Manocha
CLL
40
1
0
20 Dec 2023
CR-SFP: Learning Consistent Representation for Soft Filter Pruning
Jingyang Xiang
Zhuangzhi Chen
Jianbiao Mei
Siqi Li
Jun Chen
Yong-Jin Liu
33
0
0
17 Dec 2023
Student as an Inherent Denoiser of Noisy Teacher
Jiachen Zhao
29
0
0
15 Dec 2023
Towards Generalized Multi-stage Clustering: Multi-view Self-distillation
Jiatai Wang
Zhiwei Xu
Xin Wang
Tao Li
24
1
0
29 Oct 2023
Improving Compositional Generalization Using Iterated Learning and Simplicial Embeddings
Yi Ren
Samuel Lavoie
Mikhail Galkin
Danica J. Sutherland
Aaron Courville
41
15
0
28 Oct 2023
DistillCSE: Distilled Contrastive Learning for Sentence Embeddings
Jiahao Xu
Wei Shao
Lihui Chen
Lemao Liu
FedML
34
4
0
20 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Qingyue Zhao
Banghua Zhu
36
4
0
11 Oct 2023
RAI4IoE: Responsible AI for Enabling the Internet of Energy
Minhui Xue
Surya Nepal
Ling Liu
Subbu Sethuvenkatraman
Xingliang Yuan
Carsten Rudolph
Ruoxi Sun
Greg Eisenhauer
42
4
0
20 Sep 2023
VERSE: Virtual-Gradient Aware Streaming Lifelong Learning with Anytime Inference
S. Banerjee
Vinay Kumar Verma
Avideep Mukherjee
Deepak Gupta
Vinay P. Namboodiri
Piyush Rai
CLL
38
4
0
15 Sep 2023
Modify Training Directions in Function Space to Reduce Generalization Error
Yi Yu
Wenlian Lu
Boyu Chen
27
0
0
25 Jul 2023
Categories of Response-Based, Feature-Based, and Relation-Based Knowledge Distillation
Chuanguang Yang
Xinqiang Yu
Zhulin An
Yongjun Xu
VLM
OffRL
86
22
0
19 Jun 2023
Revisit Few-shot Intent Classification with PLMs: Direct Fine-tuning vs. Continual Pre-training
Haode Zhang
Haowen Liang
Li-Ming Zhan
Xiao-Ming Wu
Albert Y. S. Lam
VLM
16
8
0
08 Jun 2023
Parallel Neurosymbolic Integration with Concordia
Jonathan Feldstein
Modestas Jurcius
Efthymia Tsamoura
NAI
38
1
0
01 Jun 2023
Adaptive Self-Distillation for Minimizing Client Drift in Heterogeneous Federated Learning
M.Yashwanth
Gaurav Kumar Nayak
Aryaveer Singh
Yogesh Singh
Anirban Chakraborty
FedML
30
1
0
31 May 2023
Knowledge Distillation Performs Partial Variance Reduction
M. Safaryan
Alexandra Peste
Dan Alistarh
30
6
0
27 May 2023
Disentangled Phonetic Representation for Chinese Spelling Correction
Zihong Liang
Xiaojun Quan
Qifan Wang
20
17
0
24 May 2023
1
2
3
Next