Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.10419
Cited By
Why distillation helps: a statistical perspective
21 May 2020
A. Menon
A. S. Rawat
Sashank J. Reddi
Seungyeon Kim
Sanjiv Kumar
FedML
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Why distillation helps: a statistical perspective"
40 / 40 papers shown
Title
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws
M. E. Ildiz
Halil Alperen Gozeten
Ege Onur Taga
Marco Mondelli
Samet Oymak
92
5
0
24 Oct 2024
Towards Understanding Knowledge Distillation
Mary Phuong
Christoph H. Lampert
65
322
0
27 May 2021
Self-Distillation Amplifies Regularization in Hilbert Space
H. Mobahi
Mehrdad Farajtabar
Peter L. Bartlett
69
235
0
13 Feb 2020
Understanding and Improving Knowledge Distillation
Jiaxi Tang
Rakesh Shivanna
Zhe Zhao
Dong Lin
Anima Singh
Ed H. Chi
Sagar Jain
86
133
0
10 Feb 2020
Search to Distill: Pearls are Everywhere but not the Eyes
Yu Liu
Xuhui Jia
Mingxing Tan
Raviteja Vemulapalli
Yukun Zhu
Bradley Green
Xiaogang Wang
88
68
0
20 Nov 2019
Self-training with Noisy Student improves ImageNet classification
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
315
2,392
0
11 Nov 2019
Distillation
≈
\approx
≈
Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network
Bin Dong
Jikai Hou
Yiping Lu
Zhihua Zhang
64
41
0
02 Oct 2019
Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation
Zeju Li
Konstantinos Kamnitsas
Ben Glocker
SSeg
43
94
0
25 Jul 2019
Noise Regularization for Conditional Density Estimation
Jonas Rothfuss
Fabio Ferreira
S. Boehm
Simon Walther
Maxim Ulrich
Tamim Asfour
Andreas Krause
37
32
0
21 Jul 2019
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Kaidi Cao
Colin Wei
Adrien Gaidon
Nikos Arechiga
Tengyu Ma
127
1,607
0
18 Jun 2019
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
207
1,953
0
06 Jun 2019
Zero-Shot Knowledge Distillation in Deep Networks
Gaurav Kumar Nayak
Konda Reddy Mopuri
Vaisakh Shaj
R. Venkatesh Babu
Anirban Chakraborty
75
245
0
20 May 2019
Hypothesis Set Stability and Generalization
Dylan J. Foster
Spencer Greenberg
Satyen Kale
Haipeng Luo
M. Mohri
Karthik Sridharan
64
35
0
09 Apr 2019
Striking the Right Balance with Uncertainty
Salman Khan
Munawar Hayat
Waqas Zamir
Jianbing Shen
Ling Shao
78
174
0
22 Jan 2019
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation
Akhilesh Deepak Gotmare
N. Keskar
Caiming Xiong
R. Socher
ODL
76
276
0
29 Oct 2018
Stochastic Negative Mining for Learning with Large Output Spaces
Sashank J. Reddi
Satyen Kale
Felix X. Yu
D. Holtmann-Rice
Jiecao Chen
Sanjiv Kumar
NoLa
58
62
0
16 Oct 2018
Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System
Jiaxi Tang
Ke Wang
64
189
0
19 Sep 2018
Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students
Chenglin Yang
Lingxi Xie
Siyuan Qiao
Alan Yuille
70
136
0
15 May 2018
Born Again Neural Networks
Tommaso Furlanello
Zachary Chase Lipton
Michael Tschannen
Laurent Itti
Anima Anandkumar
80
1,033
0
12 May 2018
Large scale distributed neural network training through online distillation
Rohan Anil
Gabriel Pereyra
Alexandre Passos
Róbert Ormándi
George E. Dahl
Geoffrey E. Hinton
FedML
320
408
0
09 Apr 2018
Additive Margin Softmax for Face Verification
Feng Wang
Weiyang Liu
Haijun Liu
Jian Cheng
CVBM
95
1,274
0
17 Jan 2018
Data Distillation: Towards Omni-Supervised Learning
Ilija Radosavovic
Piotr Dollár
Ross B. Girshick
Georgia Gkioxari
Kaiming He
81
419
0
12 Dec 2017
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
282
9,797
0
25 Oct 2017
Sobolev Training for Neural Networks
Wojciech M. Czarnecki
Simon Osindero
Max Jaderberg
G. Swirszcz
Razvan Pascanu
59
247
0
15 Jun 2017
On Calibration of Modern Neural Networks
Chuan Guo
Geoff Pleiss
Yu Sun
Kilian Q. Weinberger
UQCV
299
5,855
0
14 Jun 2017
SphereFace: Deep Hypersphere Embedding for Face Recognition
Weiyang Liu
Yandong Wen
Zhiding Yu
Ming Li
Bhiksha Raj
Le Song
CVBM
234
2,804
0
26 Apr 2017
Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification
Maksim Lapin
Matthias Hein
Bernt Schiele
76
103
0
12 Dec 2016
Large-Margin Softmax Loss for Convolutional Neural Networks
Weiyang Liu
Yandong Wen
Zhiding Yu
Meng Yang
CVBM
81
1,456
0
07 Dec 2016
Patient-Driven Privacy Control through Generalized Distillation
Z. Berkay Celik
David Lopez-Paz
Patrick McDaniel
49
18
0
26 Nov 2016
Learning without Forgetting
Zhizhong Li
Derek Hoiem
CLL
OOD
SSL
304
4,423
0
29 Jun 2016
Rethinking the Inception Architecture for Computer Vision
Christian Szegedy
Vincent Vanhoucke
Sergey Ioffe
Jonathon Shlens
Z. Wojna
3DV
BDL
886
27,412
0
02 Dec 2015
Policy Distillation
Andrei A. Rusu
Sergio Gomez Colmenarejo
Çağlar Gülçehre
Guillaume Desjardins
J. Kirkpatrick
Razvan Pascanu
Volodymyr Mnih
Koray Kavukcuoglu
R. Hadsell
98
695
0
19 Nov 2015
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks
Nicolas Papernot
Patrick McDaniel
Xi Wu
S. Jha
A. Swami
AAML
113
3,077
0
14 Nov 2015
Unifying distillation and privileged information
David Lopez-Paz
Léon Bottou
Bernhard Schölkopf
V. Vapnik
FedML
167
463
0
11 Nov 2015
Recurrent Neural Network Training with Dark Knowledge Transfer
Zhiyuan Tang
Dong Wang
Zhiyong Zhang
62
109
0
18 May 2015
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
364
19,723
0
09 Mar 2015
Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data
Hyokun Yun
Parameswaran Raman
S.V.N. Vishwanathan
65
28
0
11 Feb 2014
Do Deep Nets Really Need to be Deep?
Lei Jimmy Ba
R. Caruana
165
2,119
0
21 Dec 2013
Conformant Planning via Symbolic Model Checking
A. Cimatti
M. Roveri
88
961
0
01 Jun 2011
Empirical Bernstein Bounds and Sample Variance Penalization
Andreas Maurer
Massimiliano Pontil
407
545
0
21 Jul 2009
1