Why distillation helps: a statistical perspective

21 May 2020

Sanjiv Kumar

Papers citing "Why distillation helps: a statistical perspective"

40 / 40 papers shown

Title
High-dimensional Analysis of Knowledge Distillation: Weak-to-Strong Generalization and Scaling Laws M. E. Ildiz Halil Alperen Gozeten Ege Onur Taga Marco Mondelli Samet Oymak 92 5 0 24 Oct 2024
Towards Understanding Knowledge Distillation Mary Phuong Christoph H. Lampert 65 322 0 27 May 2021
Self-Distillation Amplifies Regularization in Hilbert Space H. Mobahi Mehrdad Farajtabar Peter L. Bartlett 69 235 0 13 Feb 2020
Understanding and Improving Knowledge Distillation Jiaxi Tang Rakesh Shivanna Zhe Zhao Dong Lin Anima Singh Ed H. Chi Sagar Jain 86 133 0 10 Feb 2020
Search to Distill: Pearls are Everywhere but not the Eyes Yu Liu Xuhui Jia Mingxing Tan Raviteja Vemulapalli Yukun Zhu Bradley Green Xiaogang Wang 88 68 0 20 Nov 2019
Self-training with Noisy Student improves ImageNet classification Qizhe Xie Minh-Thang Luong Eduard H. Hovy Quoc V. Le NoLa 315 2,392 0 11 Nov 2019
$Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network$ Distillation $\approx$ Early Stopping? Harvesting Dark Knowledge Utilizing Anisotropic Information Retrieval For Overparameterized Neural Network Bin Dong Jikai Hou Yiping Lu Zhihua Zhang 64 41 0 02 Oct 2019
Overfitting of neural nets under class imbalance: Analysis and improvements for segmentation Zeju Li Konstantinos Kamnitsas Ben Glocker SSeg 43 94 0 25 Jul 2019
Noise Regularization for Conditional Density Estimation Jonas Rothfuss Fabio Ferreira S. Boehm Simon Walther Maxim Ulrich Tamim Asfour Andreas Krause 37 32 0 21 Jul 2019
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Kaidi Cao Colin Wei Adrien Gaidon Nikos Arechiga Tengyu Ma 127 1,607 0 18 Jun 2019
When Does Label Smoothing Help? Rafael Müller Simon Kornblith Geoffrey E. Hinton UQCV 207 1,953 0 06 Jun 2019
Zero-Shot Knowledge Distillation in Deep Networks Gaurav Kumar Nayak Konda Reddy Mopuri Vaisakh Shaj R. Venkatesh Babu Anirban Chakraborty 75 245 0 20 May 2019
Hypothesis Set Stability and Generalization Dylan J. Foster Spencer Greenberg Satyen Kale Haipeng Luo M. Mohri Karthik Sridharan 64 35 0 09 Apr 2019
Striking the Right Balance with Uncertainty Salman Khan Munawar Hayat Waqas Zamir Jianbing Shen Ling Shao 78 174 0 22 Jan 2019
A Closer Look at Deep Learning Heuristics: Learning rate restarts, Warmup and Distillation Akhilesh Deepak Gotmare N. Keskar Caiming Xiong R. Socher ODL 76 276 0 29 Oct 2018
Stochastic Negative Mining for Learning with Large Output Spaces Sashank J. Reddi Satyen Kale Felix X. Yu D. Holtmann-Rice Jiecao Chen Sanjiv Kumar NoLa 58 62 0 16 Oct 2018
Ranking Distillation: Learning Compact Ranking Models With High Performance for Recommender System Jiaxi Tang Ke Wang 64 189 0 19 Sep 2018
Knowledge Distillation in Generations: More Tolerant Teachers Educate Better Students Chenglin Yang Lingxi Xie Siyuan Qiao Alan Yuille 70 136 0 15 May 2018
Born Again Neural Networks Tommaso Furlanello Zachary Chase Lipton Michael Tschannen Laurent Itti Anima Anandkumar 80 1,033 0 12 May 2018
Large scale distributed neural network training through online distillation Rohan Anil Gabriel Pereyra Alexandre Passos Róbert Ormándi George E. Dahl Geoffrey E. Hinton FedML 320 408 0 09 Apr 2018
Additive Margin Softmax for Face Verification Feng Wang Weiyang Liu Haijun Liu Jian Cheng CVBM 95 1,274 0 17 Jan 2018
Data Distillation: Towards Omni-Supervised Learning Ilija Radosavovic Piotr Dollár Ross B. Girshick Georgia Gkioxari Kaiming He 81 419 0 12 Dec 2017
mixup: Beyond Empirical Risk Minimization Hongyi Zhang Moustapha Cissé Yann N. Dauphin David Lopez-Paz NoLa 282 9,797 0 25 Oct 2017
Sobolev Training for Neural Networks Wojciech M. Czarnecki Simon Osindero Max Jaderberg G. Swirszcz Razvan Pascanu 59 247 0 15 Jun 2017
On Calibration of Modern Neural Networks Chuan Guo Geoff Pleiss Yu Sun Kilian Q. Weinberger UQCV 299 5,855 0 14 Jun 2017
SphereFace: Deep Hypersphere Embedding for Face Recognition Weiyang Liu Yandong Wen Zhiding Yu Ming Li Bhiksha Raj Le Song CVBM 234 2,804 0 26 Apr 2017
Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification Maksim Lapin Matthias Hein Bernt Schiele 76 103 0 12 Dec 2016
Large-Margin Softmax Loss for Convolutional Neural Networks Weiyang Liu Yandong Wen Zhiding Yu Meng Yang CVBM 81 1,456 0 07 Dec 2016
Patient-Driven Privacy Control through Generalized Distillation Z. Berkay Celik David Lopez-Paz Patrick McDaniel 49 18 0 26 Nov 2016
Learning without Forgetting Zhizhong Li Derek Hoiem CLL OOD SSL 304 4,423 0 29 Jun 2016
Rethinking the Inception Architecture for Computer Vision Christian Szegedy Vincent Vanhoucke Sergey Ioffe Jonathon Shlens Z. Wojna 3DV BDL 886 27,412 0 02 Dec 2015
Policy Distillation Andrei A. Rusu Sergio Gomez Colmenarejo Çağlar Gülçehre Guillaume Desjardins J. Kirkpatrick Razvan Pascanu Volodymyr Mnih Koray Kavukcuoglu R. Hadsell 98 695 0 19 Nov 2015
Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks Nicolas Papernot Patrick McDaniel Xi Wu S. Jha A. Swami AAML 113 3,077 0 14 Nov 2015
Unifying distillation and privileged information David Lopez-Paz Léon Bottou Bernhard Schölkopf V. Vapnik FedML 167 463 0 11 Nov 2015
Recurrent Neural Network Training with Dark Knowledge Transfer Zhiyuan Tang Dong Wang Zhiyong Zhang 62 109 0 18 May 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 364 19,723 0 09 Mar 2015
Ranking via Robust Binary Classification and Parallel Parameter Estimation in Large-Scale Data Hyokun Yun Parameswaran Raman S.V.N. Vishwanathan 65 28 0 11 Feb 2014
Do Deep Nets Really Need to be Deep? Lei Jimmy Ba R. Caruana 165 2,119 0 21 Dec 2013
Conformant Planning via Symbolic Model Checking A. Cimatti M. Roveri 88 961 0 01 Jun 2011
Empirical Bernstein Bounds and Sample Variance Penalization Andreas Maurer Massimiliano Pontil 407 545 0 21 Jul 2009