ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2005.10419
  4. Cited By
Why distillation helps: a statistical perspective

Why distillation helps: a statistical perspective

21 May 2020
A. Menon
A. S. Rawat
Sashank J. Reddi
Seungyeon Kim
Sanjiv Kumar
    FedML
ArXiv (abs)PDFHTML

Papers citing "Why distillation helps: a statistical perspective"

40 / 40 papers shown
Title
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. E. Ildiz
Halil Alperen Gozeten
Ege Onur Taga
Marco Mondelli
Samet Oymak
92
5
0
24 Oct 2024
Towards Understanding Knowledge Distillation
Towards Understanding Knowledge Distillation
Mary Phuong
Christoph H. Lampert
65
322
0
27 May 2021
Self-Distillation Amplifies Regularization in Hilbert Space
Self-Distillation Amplifies Regularization in Hilbert Space
H. Mobahi
Mehrdad Farajtabar
Peter L. Bartlett
69
235
0
13 Feb 2020
Understanding and Improving Knowledge Distillation
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
86
133
0
10 Feb 2020
Search to Distill: Pearls are Everywhere but not the Eyes
Search to Distill: Pearls are Everywhere but not the Eyes
Yu Liu
Xuhui Jia
Mingxing Tan
Raviteja Vemulapalli
Yukun Zhu
Bradley Green
Xiaogang Wang
88
68
0
20 Nov 2019
Self-training with Noisy Student improves ImageNet classification
Self-training with Noisy Student improves ImageNet classification
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
315
2,392
0
11 Nov 2019
Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge
  Utilizing Anisotropic Information Retrieval For Overparameterized Neural
  Network
Distillation ≈\approx≈ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network
Bin Dong
Jikai Hou
Yiping Lu
Zhihua Zhang
64
41
0
02 Oct 2019
Overfitting of neural nets under class imbalance: Analysis and
  improvements for segmentation
Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation
Zeju Li
Konstantinos Kamnitsas
Ben Glocker
SSeg
43
94
0
25 Jul 2019
Noise Regularization for Conditional Density Estimation
Noise Regularization for Conditional Density Estimation
Jonas Rothfuss
Fabio Ferreira
S. Boehm
Simon Walther
Maxim Ulrich
Tamim Asfour
Andreas Krause
37
32
0
21 Jul 2019
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Kaidi Cao
Colin Wei
Adrien Gaidon
Nikos Arechiga
Tengyu Ma
127
1,607
0
18 Jun 2019
When Does Label Smoothing Help?
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
207
1,953
0
06 Jun 2019
Zero-Shot Knowledge Distillation in Deep Networks
Zero-Shot Knowledge Distillation in Deep Networks
Gaurav Kumar Nayak
Konda Reddy Mopuri
Vaisakh Shaj
R. Venkatesh Babu
Anirban Chakraborty
75
245
0
20 May 2019
Hypothesis Set Stability and Generalization
Hypothesis Set Stability and Generalization
Dylan J. Foster
Spencer Greenberg
Satyen Kale
Haipeng Luo
M. Mohri
Karthik Sridharan
64
35
0
09 Apr 2019
Striking the Right Balance with Uncertainty
Striking the Right Balance with Uncertainty
Salman Khan
Munawar Hayat
Waqas Zamir
Jianbing Shen
Ling Shao
78
174
0
22 Jan 2019
A Closer Look at Deep Learning Heuristics: Learning rate restarts,
  Warmup and Distillation
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
76
276
0
29 Oct 2018
Stochastic Negative Mining for Learning with Large Output Spaces
Stochastic Negative Mining for Learning with Large Output Spaces
Sashank J. Reddi
Satyen Kale
Felix X. Yu
D. Holtmann-Rice
Jiecao Chen
Sanjiv Kumar
NoLa
58
62
0
16 Oct 2018
Ranking Distillation: Learning Compact Ranking Models With High
  Performance for Recommender System
Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System
Jiaxi Tang
Ke Wang
64
189
0
19 Sep 2018
Knowledge Distillation in Generations: More Tolerant Teachers Educate
  Better Students
Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
Chenglin Yang
Lingxi Xie
Siyuan Qiao
Alan Yuille
70
136
0
15 May 2018
Born Again Neural Networks
Born Again Neural Networks
Tommaso Furlanello
Zachary Chase Lipton
Michael Tschannen
Laurent Itti
Anima Anandkumar
80
1,033
0
12 May 2018
Large scale distributed neural network training through online
  distillation
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
320
408
0
09 Apr 2018
Additive Margin Softmax for Face Verification
Additive Margin Softmax for Face Verification
Feng Wang
Weiyang Liu
Haijun Liu
Jian Cheng
CVBM
95
1,274
0
17 Jan 2018
Data Distillation: Towards Omni-Supervised Learning
Data Distillation: Towards Omni-Supervised Learning
Ilija Radosavovic
Piotr Dollár
Ross B. Girshick
Georgia Gkioxari
Kaiming He
81
419
0
12 Dec 2017
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
282
9,797
0
25 Oct 2017
Sobolev Training for Neural Networks
Sobolev Training for Neural Networks
Wojciech M. Czarnecki
Simon Osindero
Max Jaderberg
G. Swirszcz
Razvan Pascanu
59
247
0
15 Jun 2017
On Calibration of Modern Neural Networks
On Calibration of Modern Neural Networks
Chuan Guo
Geoff Pleiss
Yu Sun
Kilian Q. Weinberger
UQCV
299
5,855
0
14 Jun 2017
SphereFace: Deep Hypersphere Embedding for Face Recognition
SphereFace: Deep Hypersphere Embedding for Face Recognition
Weiyang Liu
Yandong Wen
Zhiding Yu
Ming Li
Bhiksha Raj
Le Song
CVBM
234
2,804
0
26 Apr 2017
Analysis and Optimization of Loss Functions for Multiclass, Top-k, and
  Multilabel Classification
Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification
Maksim Lapin
Matthias Hein
Bernt Schiele
76
103
0
12 Dec 2016
Large-Margin Softmax Loss for Convolutional Neural Networks
Large-Margin Softmax Loss for Convolutional Neural Networks
Weiyang Liu
Yandong Wen
Zhiding Yu
Meng Yang
CVBM
81
1,456
0
07 Dec 2016
Patient-Driven Privacy Control through Generalized Distillation
Patient-Driven Privacy Control through Generalized Distillation
Z. Berkay Celik
David Lopez-Paz
Patrick McDaniel
49
18
0
26 Nov 2016
Learning without Forgetting
Learning without Forgetting
Zhizhong Li
Derek Hoiem
CLLOODSSL
304
4,423
0
29 Jun 2016
Rethinking the Inception Architecture for Computer Vision
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DVBDL
886
27,412
0
02 Dec 2015
Policy Distillation
Policy Distillation
Andrei A. Rusu
Sergio Gomez Colmenarejo
Çağlar Gülçehre
Guillaume Desjardins
J. Kirkpatrick
Razvan Pascanu
Volodymyr Mnih
Koray Kavukcuoglu
R. Hadsell
98
695
0
19 Nov 2015
Distillation as a Defense to Adversarial Perturbations against Deep
  Neural Networks
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Nicolas Papernot
Patrick McDaniel
Xi Wu
S. Jha
A. Swami
AAML
113
3,077
0
14 Nov 2015
Unifying distillation and privileged information
Unifying distillation and privileged information
David Lopez-Paz
Léon Bottou
Bernhard Schölkopf
V. Vapnik
FedML
167
463
0
11 Nov 2015
Recurrent Neural Network Training with Dark Knowledge Transfer
Recurrent Neural Network Training with Dark Knowledge Transfer
Zhiyuan Tang
Dong Wang
Zhiyong Zhang
62
109
0
18 May 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
364
19,723
0
09 Mar 2015
Ranking via Robust Binary Classification and Parallel Parameter
  Estimation in Large-Scale Data
Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data
Hyokun Yun
Parameswaran Raman
S.V.N. Vishwanathan
65
28
0
11 Feb 2014
Do Deep Nets Really Need to be Deep?
Do Deep Nets Really Need to be Deep?
Lei Jimmy Ba
R. Caruana
165
2,119
0
21 Dec 2013
Conformant Planning via Symbolic Model Checking
Conformant Planning via Symbolic Model Checking
A. Cimatti
M. Roveri
88
961
0
01 Jun 2011
Empirical Bernstein Bounds and Sample Variance Penalization
Empirical Bernstein Bounds and Sample Variance Penalization
Andreas Maurer
Massimiliano Pontil
407
545
0
21 Jul 2009
1