ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2203.06760
  4. Cited By
CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio
  Classification

CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification

13 March 2022
Yuan Gong
Sameer Khurana
Andrew Rouditchenko
James R. Glass
    VLM
ArXivPDFHTML

Papers citing "CMKD: CNN/Transformer-Based Cross-Model Knowledge Distillation for Audio Classification"

48 / 48 papers shown
Title
GATE3D: Generalized Attention-based Task-synergized Estimation in 3D*
GATE3D: Generalized Attention-based Task-synergized Estimation in 3D*
Eunsoo Im
Jung Kwon Lee
Changhyun Jee
120
0
0
15 Apr 2025
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
154
271
0
02 Feb 2022
Patches Are All You Need?
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
260
410
0
24 Jan 2022
A ConvNet for the 2020s
A ConvNet for the 2020s
Zhuang Liu
Hanzi Mao
Chaozheng Wu
Christoph Feichtenhofer
Trevor Darrell
Saining Xie
ViT
159
5,171
0
10 Jan 2022
Temporal Knowledge Distillation for On-device Audio Classification
Temporal Knowledge Distillation for On-device Audio Classification
Kwanghee Choi
Martin Kersner
Jacob Morton
Buru Chang
87
26
0
27 Oct 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
145
269
0
21 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
67
24
0
14 Oct 2021
Efficient Training of Audio Transformers with Patchout
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
128
257
0
11 Oct 2021
Do Vision Transformers See Like Convolutional Neural Networks?
Do Vision Transformers See Like Convolutional Neural Networks?
M. Raghu
Thomas Unterthiner
Simon Kornblith
Chiyuan Zhang
Alexey Dosovitskiy
ViT
132
958
0
19 Aug 2021
Mobile-Former: Bridging MobileNet and Transformer
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
258
488
0
12 Aug 2021
Improving Sound Event Classification by Increasing Shift Invariance in
  Convolutional Neural Networks
Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks
Eduardo Fonseca
Andrés Ferraro
Xavier Serra
AI4TS
98
9
0
01 Jul 2021
Early Convolutions Help Transformers See Better
Early Convolutions Help Transformers See Better
Tete Xiao
Mannat Singh
Eric Mintun
Trevor Darrell
Piotr Dollár
Ross B. Girshick
52
768
0
28 Jun 2021
Co-advise: Cross Inductive Bias Distillation
Co-advise: Cross Inductive Bias Distillation
Sucheng Ren
Zhengqi Gao
Tianyu Hua
Zihui Xue
Yonglong Tian
Shengfeng He
Hang Zhao
74
52
0
23 Jun 2021
Knowledge distillation: A good teacher is patient and consistent
Knowledge distillation: A good teacher is patient and consistent
Lucas Beyer
Xiaohua Zhai
Amelie Royer
L. Markeeva
Rohan Anil
Alexander Kolesnikov
VLM
107
295
0
09 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
114
1,204
0
09 Jun 2021
Intriguing Properties of Vision Transformers
Intriguing Properties of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
316
649
0
21 May 2021
ConTNet: Why not use convolution and transformer at the same time?
ConTNet: Why not use convolution and transformer at the same time?
Haotian Yan
Zhe Li
Weijian Li
Changhu Wang
Ming Wu
Chuang Zhang
ViT
67
77
0
27 Apr 2021
AST: Audio Spectrogram Transformer
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
116
865
0
05 Apr 2021
Incorporating Convolution Designs into Visual Transformers
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
111
478
0
22 Mar 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and
  Aggregation
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
166
147
0
02 Feb 2021
Training data-efficient image transformers & distillation through
  attention
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
384
6,768
0
23 Dec 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
651
41,103
0
22 Oct 2020
FSD50K: An Open Dataset of Human-Labeled Sound Events
FSD50K: An Open Dataset of Human-Labeled Sound Events
Eduardo Fonseca
Xavier Favory
Jordi Pons
F. Font
Xavier Serra
75
459
0
01 Oct 2020
Rethinking CNN Models for Audio Classification
Rethinking CNN Models for Audio Classification
Kamalesh Palanisamy
Dipika Singhania
Angela Yao
SSL
67
145
0
22 Jul 2020
Transferring Inductive Biases through Knowledge Distillation
Transferring Inductive Biases through Knowledge Distillation
Samira Abnar
Mostafa Dehghani
Willem H. Zuidema
69
59
0
31 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
223
3,139
0
16 May 2020
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a
  Teacher-Student Framework With Loss Masking
Addressing Missing Labels in Large-Scale Sound Event Recognition Using a Teacher-Student Framework With Loss Masking
Eduardo Fonseca
Shawn Hershey
Manoj Plakal
D. Ellis
A. Jansen
R. C. Moore
Xavier Serra
NoLa
87
23
0
02 May 2020
ESResNet: Environmental Sound Classification Based on Visual Domain
  Models
ESResNet: Environmental Sound Classification Based on Visual Domain Models
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
VLM
104
92
0
15 Apr 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
186
1,076
0
21 Dec 2019
Self-training with Noisy Student improves ImageNet classification
Self-training with Noisy Student improves ImageNet classification
Qizhe Xie
Minh-Thang Luong
Eduard H. Hovy
Quoc V. Le
NoLa
307
2,386
0
11 Nov 2019
Urban Sound Tagging using Convolutional Neural Networks
Urban Sound Tagging using Convolutional Neural Networks
Sainath Adapa
59
38
0
27 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
653
24,464
0
26 Jul 2019
Scalable Syntax-Aware Language Models Using Knowledge Distillation
Scalable Syntax-Aware Language Models Using Knowledge Distillation
A. Kuncoro
Chris Dyer
Laura Rimell
S. Clark
Phil Blunsom
121
26
0
14 Jun 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
139
18,134
0
28 May 2019
SpecAugment: A Simple Data Augmentation Method for Automatic Speech
  Recognition
SpecAugment: A Simple Data Augmentation Method for Automatic Speech Recognition
Daniel S. Park
William Chan
Yu Zhang
Chung-Cheng Chiu
Barret Zoph
E. D. Cubuk
Quoc V. Le
VLM
177
3,461
0
18 Apr 2019
A Comparison of Five Multiple Instance Learning Pooling Functions for
  Sound Event Detection with Weak Labeling
A Comparison of Five Multiple Instance Learning Pooling Functions for Sound Event Detection with Weak Labeling
Yun Wang
Juncheng Billy Li
Florian Metze
46
184
0
22 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
A Closer Look at Weak Label Learning for Audio Events
A Closer Look at Weak Label Learning for Audio Events
Ankit Parag Shah
Anurag Kumar
Alexander G. Hauptmann
Bhiksha Raj
65
65
0
24 Apr 2018
Averaging Weights Leads to Wider Optima and Better Generalization
Averaging Weights Leads to Wider Optima and Better Generalization
Pavel Izmailov
Dmitrii Podoprikhin
T. Garipov
Dmitry Vetrov
A. Wilson
FedML
MoMe
130
1,662
0
14 Mar 2018
mixup: Beyond Empirical Risk Minimization
mixup: Beyond Empirical Risk Minimization
Hongyi Zhang
Moustapha Cissé
Yann N. Dauphin
David Lopez-Paz
NoLa
280
9,764
0
25 Oct 2017
Comparison of Time-Frequency Representations for Environmental Sound
  Classification using Convolutional Neural Networks
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
M. Huzaifah
AI4TS
56
148
0
22 Jun 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision
  Applications
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications
Andrew G. Howard
Menglong Zhu
Bo Chen
Dmitry Kalenichenko
Weijun Wang
Tobias Weyand
M. Andreetto
Hartwig Adam
3DH
1.1K
20,837
0
17 Apr 2017
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
120
2,500
0
29 Sep 2016
Densely Connected Convolutional Networks
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
772
36,813
0
25 Aug 2016
Deep Convolutional Neural Networks and Data Augmentation for
  Environmental Sound Classification
Deep Convolutional Neural Networks and Data Augmentation for Environmental Sound Classification
Justin Salamon
J. P. Bello
67
1,304
0
15 Aug 2016
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
2.2K
194,020
0
10 Dec 2015
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
362
19,643
0
09 Mar 2015
1