ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.13180
  4. Cited By
Prototype-guided Cross-task Knowledge Distillation for Large-scale
  Models

Prototype-guided Cross-task Knowledge Distillation for Large-scale Models

26 December 2022
Deng Li
Aming Wu
Yahong Han
Qingwen Tian
    VLM
ArXivPDFHTML

Papers citing "Prototype-guided Cross-task Knowledge Distillation for Large-scale Models"

42 / 42 papers shown
Title
Knowledge Distillation Using Hierarchical Self-Supervision Augmented
  Distribution
Knowledge Distillation Using Hierarchical Self-Supervision Augmented Distribution
Chuanguang Yang
Zhulin An
Linhang Cai
Yongjun Xu
54
15
0
07 Sep 2021
Distilling a Powerful Student Model via Online Knowledge Distillation
Distilling a Powerful Student Model via Online Knowledge Distillation
Shaojie Li
Mingbao Lin
Yan Wang
Yongjian Wu
Yonghong Tian
Ling Shao
Rongrong Ji
FedML
54
47
0
26 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
213
21,051
0
25 Mar 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
233
6,657
0
23 Dec 2020
Transformer Interpretability Beyond Attention Visualization
Transformer Interpretability Beyond Attention Visualization
Hila Chefer
Shir Gur
Lior Wolf
56
652
0
17 Dec 2020
Cross-Layer Distillation with Semantic Calibration
Cross-Layer Distillation with Semantic Calibration
Defang Chen
Jian-Ping Mei
Yuan Zhang
Can Wang
Yan Feng
Chun-Yen Chen
FedML
60
294
0
06 Dec 2020
Channel-wise Knowledge Distillation for Dense Prediction
Channel-wise Knowledge Distillation for Dense Prediction
Changyong Shu
Yifan Liu
Jianfei Gao
Zheng Yan
Chunhua Shen
42
261
0
26 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
172
40,217
0
22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
129
4,993
0
08 Oct 2020
Knowledge Distillation: A Survey
Knowledge Distillation: A Survey
Jianping Gou
B. Yu
Stephen J. Maybank
Dacheng Tao
VLM
44
2,907
0
09 Jun 2020
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
230
12,847
0
26 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation
  Pre-training
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training
Linjie Li
Yen-Chun Chen
Yu Cheng
Zhe Gan
Licheng Yu
Jingjing Liu
MLLM
VLM
OffRL
AI4TS
78
496
0
01 May 2020
Designing Network Design Spaces
Designing Network Design Spaces
Ilija Radosavovic
Raj Prateek Kosaraju
Ross B. Girshick
Kaiming He
Piotr Dollár
GNN
68
1,672
0
30 Mar 2020
Prototype Rectification for Few-Shot Learning
Prototype Rectification for Few-Shot Learning
Jinlu Liu
Liang Song
Yongqiang Qin
53
247
0
25 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and
  lighter
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
84
7,386
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
32
1,838
0
23 Sep 2019
Patient Knowledge Distillation for BERT Model Compression
Patient Knowledge Distillation for BERT Model Compression
S. Sun
Yu Cheng
Zhe Gan
Jingjing Liu
92
833
0
25 Aug 2019
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
VL-BERT: Pre-training of Generic Visual-Linguistic Representations
Weijie Su
Xizhou Zhu
Yue Cao
Bin Li
Lewei Lu
Furu Wei
Jifeng Dai
VLM
MLLM
SSL
97
1,657
0
22 Aug 2019
LXMERT: Learning Cross-Modality Encoder Representations from
  Transformers
LXMERT: Learning Cross-Modality Encoder Representations from Transformers
Hao Hao Tan
Joey Tianyi Zhou
VLM
MLLM
183
2,467
0
20 Aug 2019
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss
Kaidi Cao
Colin Wei
Adrien Gaidon
Nikos Arechiga
Tengyu Ma
64
1,583
0
18 Jun 2019
Distilling Object Detectors with Fine-grained Feature Imitation
Distilling Object Detectors with Fine-grained Feature Imitation
Tao Wang
Li-xin Yuan
Xiaopeng Zhang
Jiashi Feng
ObjD
17
379
0
09 Jun 2019
When Does Label Smoothing Help?
When Does Label Smoothing Help?
Rafael Müller
Simon Kornblith
Geoffrey E. Hinton
UQCV
112
1,931
0
06 Jun 2019
Variational Information Distillation for Knowledge Transfer
Variational Information Distillation for Knowledge Transfer
SungSoo Ahn
S. Hu
Andreas C. Damianou
Neil D. Lawrence
Zhenwen Dai
67
612
0
11 Apr 2019
Relational Knowledge Distillation
Relational Knowledge Distillation
Wonpyo Park
Dongju Kim
Yan Lu
Minsu Cho
47
1,396
0
10 Apr 2019
Cross-Modal Self-Attention Network for Referring Image Segmentation
Cross-Modal Self-Attention Network for Referring Image Segmentation
Linwei Ye
Mrigank Rochan
Zhi Liu
Yang Wang
EgoV
15
472
0
09 Apr 2019
A Comprehensive Overhaul of Feature Distillation
A Comprehensive Overhaul of Feature Distillation
Byeongho Heo
Jeesoo Kim
Sangdoo Yun
Hyojin Park
Nojun Kwak
J. Choi
44
571
0
03 Apr 2019
Class-Balanced Loss Based on Effective Number of Samples
Class-Balanced Loss Based on Effective Number of Samples
Huayu Chen
Menglin Jia
Nayeon Lee
Yang Song
Serge J. Belongie
142
2,253
0
16 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
808
93,936
0
11 Oct 2018
Learning Deep Representations with Probabilistic Knowledge Transfer
Learning Deep Representations with Probabilistic Knowledge Transfer
Nikolaos Passalis
Anastasios Tefas
42
407
0
28 Mar 2018
To prune, or not to prune: exploring the efficacy of pruning for model
  compression
To prune, or not to prune: exploring the efficacy of pruning for model compression
Michael Zhu
Suyog Gupta
106
1,262
0
05 Oct 2017
Semantic Foggy Scene Understanding with Synthetic Data
Semantic Foggy Scene Understanding with Synthetic Data
Daniel Gehrig
Dengxin Dai
Luc Van Gool
65
1,094
0
25 Aug 2017
Deep Hashing Network for Unsupervised Domain Adaptation
Deep Hashing Network for Unsupervised Domain Adaptation
Hemanth Venkateswara
José Eusébio
Shayok Chakraborty
S. Panchanathan
OOD
78
2,023
0
22 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
314
129,831
0
12 Jun 2017
Learning multiple visual domains with residual adapters
Learning multiple visual domains with residual adapters
Sylvestre-Alvise Rebuffi
Hakan Bilen
Andrea Vedaldi
OOD
67
924
0
22 May 2017
Prototypical Networks for Few-shot Learning
Prototypical Networks for Few-shot Learning
Jake C. Snell
Kevin Swersky
R. Zemel
173
8,072
0
15 Mar 2017
Paying More Attention to Attention: Improving the Performance of
  Convolutional Neural Networks via Attention Transfer
Paying More Attention to Attention: Improving the Performance of Convolutional Neural Networks via Attention Transfer
Sergey Zagoruyko
N. Komodakis
86
2,561
0
12 Dec 2016
The Cityscapes Dataset for Semantic Urban Scene Understanding
The Cityscapes Dataset for Semantic Urban Scene Understanding
Marius Cordts
Mohamed Omran
Sebastian Ramos
Timo Rehfeld
Markus Enzweiler
Rodrigo Benenson
Uwe Franke
Stefan Roth
Bernt Schiele
522
11,540
0
06 Apr 2016
Quantized Convolutional Neural Networks for Mobile Devices
Quantized Convolutional Neural Networks for Mobile Devices
Jiaxiang Wu
Cong Leng
Yuhang Wang
Qinghao Hu
Jian Cheng
MQ
53
1,162
0
21 Dec 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
106
19,448
0
09 Mar 2015
FitNets: Hints for Thin Deep Nets
FitNets: Hints for Thin Deep Nets
Adriana Romero
Nicolas Ballas
Samira Ebrahimi Kahou
Antoine Chassang
C. Gatta
Yoshua Bengio
FedML
199
3,862
0
19 Dec 2014
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context
Nayeon Lee
Michael Maire
Serge J. Belongie
Lubomir Bourdev
Ross B. Girshick
James Hays
Pietro Perona
Deva Ramanan
C. L. Zitnick
Piotr Dollár
ObjD
163
43,290
0
01 May 2014
Do Deep Nets Really Need to be Deep?
Do Deep Nets Really Need to be Deep?
Lei Jimmy Ba
R. Caruana
130
2,114
0
21 Dec 2013
1