Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

26 December 2022

Papers citing "Prototype-guided Cross-task Knowledge Distillation for Large-scale Models"

42 / 42 papers shown

Title
Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution Chuanguang Yang Zhulin An Linhang Cai Yongjun Xu 54 15 0 07 Sep 2021
Distilling a Powerful Student Model via Online Knowledge Distillation Shaojie Li Mingbao Lin Yan Wang Yongjian Wu Yonghong Tian Ling Shao Rongrong Ji FedML 54 47 0 26 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows Ze Liu Yutong Lin Yue Cao Han Hu Yixuan Wei Zheng Zhang Stephen Lin B. Guo ViT 213 21,051 0 25 Mar 2021
Training data-efficient image transformers & distillation through attention Hugo Touvron Matthieu Cord Matthijs Douze Francisco Massa Alexandre Sablayrolles Hervé Jégou ViT 233 6,657 0 23 Dec 2020
Transformer Interpretability Beyond Attention Visualization Hila Chefer Shir Gur Lior Wolf 56 652 0 17 Dec 2020
Cross-Layer Distillation with Semantic Calibration Defang Chen Jian-Ping Mei Yuan Zhang Can Wang Yan Feng Chun-Yen Chen FedML 60 294 0 06 Dec 2020
Channel-wise Knowledge Distillation for Dense Prediction Changyong Shu Yifan Liu Jianfei Gao Zheng Yan Chunhua Shen 42 261 0 26 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale Alexey Dosovitskiy Lucas Beyer Alexander Kolesnikov Dirk Weissenborn Xiaohua Zhai ... Matthias Minderer G. Heigold Sylvain Gelly Jakob Uszkoreit N. Houlsby ViT 172 40,217 0 22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection Xizhou Zhu Weijie Su Lewei Lu Bin Li Xiaogang Wang Jifeng Dai ViT 129 4,993 0 08 Oct 2020
Knowledge Distillation: A Survey Jianping Gou B. Yu Stephen J. Maybank Dacheng Tao VLM 44 2,907 0 09 Jun 2020
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 230 12,847 0 26 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training Linjie Li Yen-Chun Chen Yu Cheng Zhe Gan Licheng Yu Jingjing Liu MLLM VLM OffRL AI4TS 78 496 0 01 May 2020
Designing Network Design Spaces Ilija Radosavovic Raj Prateek Kosaraju Ross B. Girshick Kaiming He Piotr Dollár GNN 68 1,672 0 30 Mar 2020
Prototype Rectification for Few-Shot Learning Jinlu Liu Liang Song Yongqiang Qin 53 247 0 25 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 84 7,386 0 02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding Xiaoqi Jiao Yichun Yin Lifeng Shang Xin Jiang Xiao Chen Linlin Li F. Wang Qun Liu VLM 32 1,838 0 23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression S. Sun Yu Cheng Zhe Gan Jingjing Liu 92 833 0 25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations Weijie Su Xizhou Zhu Yue Cao Bin Li Lewei Lu Furu Wei Jifeng Dai VLM MLLM SSL 97 1,657 0 22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from Transformers Hao Hao Tan Joey Tianyi Zhou VLM MLLM 183 2,467 0 20 Aug 2019
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss Kaidi Cao Colin Wei Adrien Gaidon Nikos Arechiga Tengyu Ma 64 1,583 0 18 Jun 2019
Distilling Object Detectors with Fine-grained Feature Imitation Tao Wang Li-xin Yuan Xiaopeng Zhang Jiashi Feng ObjD 17 379 0 09 Jun 2019
When Does Label Smoothing Help? Rafael Müller Simon Kornblith Geoffrey E. Hinton UQCV 112 1,931 0 06 Jun 2019
Variational Information Distillation for Knowledge Transfer SungSoo Ahn S. Hu Andreas C. Damianou Neil D. Lawrence Zhenwen Dai 67 612 0 11 Apr 2019
Relational Knowledge Distillation Wonpyo Park Dongju Kim Yan Lu Minsu Cho 47 1,396 0 10 Apr 2019
Cross-Modal Self-Attention Network for Referring Image Segmentation Linwei Ye Mrigank Rochan Zhi Liu Yang Wang EgoV 15 472 0 09 Apr 2019
A Comprehensive Overhaul of Feature Distillation Byeongho Heo Jeesoo Kim Sangdoo Yun Hyojin Park Nojun Kwak J. Choi 44 571 0 03 Apr 2019
Class-Balanced Loss Based on Effective Number of Samples Huayu Chen Menglin Jia Nayeon Lee Yang Song Serge J. Belongie 142 2,253 0 16 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 808 93,936 0 11 Oct 2018
Learning Deep Representations with Probabilistic Knowledge Transfer Nikolaos Passalis Anastasios Tefas 42 407 0 28 Mar 2018
To prune, or not to prune: exploring the efficacy of pruning for model compression Michael Zhu Suyog Gupta 106 1,262 0 05 Oct 2017
Semantic Foggy Scene Understanding with Synthetic Data Daniel Gehrig Dengxin Dai Luc Van Gool 65 1,094 0 25 Aug 2017
Deep Hashing Network for Unsupervised Domain Adaptation Hemanth Venkateswara José Eusébio Shayok Chakraborty S. Panchanathan OOD 78 2,023 0 22 Jun 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 314 129,831 0 12 Jun 2017
Learning multiple visual domains with residual adapters Sylvestre-Alvise Rebuffi Hakan Bilen Andrea Vedaldi OOD 67 924 0 22 May 2017
Prototypical Networks for Few-shot Learning Jake C. Snell Kevin Swersky R. Zemel 173 8,072 0 15 Mar 2017
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer Sergey Zagoruyko N. Komodakis 86 2,561 0 12 Dec 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding Marius Cordts Mohamed Omran Sebastian Ramos Timo Rehfeld Markus Enzweiler Rodrigo Benenson Uwe Franke Stefan Roth Bernt Schiele 522 11,540 0 06 Apr 2016
Quantized Convolutional Neural Networks for Mobile Devices Jiaxiang Wu Cong Leng Yuhang Wang Qinghao Hu Jian Cheng MQ 53 1,162 0 21 Dec 2015
Distilling the Knowledge in a Neural Network Geoffrey E. Hinton Oriol Vinyals J. Dean FedML 106 19,448 0 09 Mar 2015
FitNets: Hints for Thin Deep Nets Adriana Romero Nicolas Ballas Samira Ebrahimi Kahou Antoine Chassang C. Gatta Yoshua Bengio FedML 199 3,862 0 19 Dec 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 163 43,290 0 01 May 2014
Do Deep Nets Really Need to be Deep? Lei Jimmy Ba R. Caruana 130 2,114 0 21 Dec 2013