Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation

9 November 2022

Papers citing "Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation"

30 / 30 papers shown

Title
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining Paul Primus Florian Schmid Gerhard Widmer CLIP AI4TS VLM 33 0 0 12 May 2025
Exploring Performance-Complexity Trade-Offs in Sound Event Detection T. Morocutti Florian Schmid Jonathan Greif Francesco Foscarin Gerhard Widmer 38 0 0 14 Mar 2025
Creating a Good Teacher for Knowledge Distillation in Acoustic Scene Classification T. Morocutti Florian Schmid Khaled Koutini Gerhard Widmer 39 0 0 14 Mar 2025
A Survey of Recent Advances and Challenges in Deep Audio-Visual Correlation Learning Luis Vilaca Yi Yu Paula Vinan 75 0 0 24 Nov 2024
Data-Efficient Low-Complexity Acoustic Scene Classification via Distilling and Progressive Pruning Bing Han Wen Huang Zhengyang Chen Anbai Jiang Pingyi Fan Cheng Lu Zhiqiang Lv Jia Liu W. Zhang Yanmin Qian 31 2 0 28 Oct 2024
Spectral and Rhythm Features for Audio Classification with Deep Convolutional Neural Networks Friedrich Wolf-Monheim 21 2 0 09 Oct 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events Xiaoyu Yang Qiujia Li Chao Zhang P. Woodland 24 0 0 25 Sep 2024
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics Burooj Ghani Vincent J. Kalkman Bob Planqué Willem-Pier Vellinga L. Gill Dan Stowell VLM 32 5 0 21 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection Florian Schmid T. Morocutti Francesco Foscarin Jan Schluter Paul Primus Gerhard Widmer ViT 28 2 0 14 Sep 2024
Wav2Small: Distilling Wav2Vec2 to 72K parameters for Low-Resource Speech emotion recognition Dionyssos Kounadis-Bastian Oliver Schrufer Anna Derington H. Wierstorf F. Eyben Felix Burkhardt Björn Schuller 29 1 0 25 Aug 2024
Macformer: Transformer with Random Maclaurin Feature Attention Yuhan Guo Lizhong Ding Ye Yuan Guoren Wang 46 0 0 21 Aug 2024
Estimated Audio-Caption Correspondences Improve Language-Based Audio Retrieval Paul Primus Florian Schmid Gerhard Widmer 31 2 0 21 Aug 2024
TEAdapter: Supply abundant guidance for controllable text-to-music generation Jialing Zou Jiahao Mei Xudong Nan Jinghua Li Daoguo Dong Liang He 31 0 0 09 Aug 2024
Integrating IP Broadcasting with Audio Tags: Workflow and Challenges Rhys Burchett-Vass Arshdeep Singh Gabriel Bibbó Mark D. Plumbley 29 0 0 22 Jul 2024
Efficient Audio Captioning with Encoder-Level Knowledge Distillation Xuenan Xu Haohe Liu Mengyue Wu Wenwu Wang Mark D. Plumbley 40 1 0 19 Jul 2024
Improving Audio Spectrogram Transformers for Sound Event Detection Through Multi-Stage Training Florian Schmid Paul Primus T. Morocutti Jonathan Greif Gerhard Widmer 32 5 0 17 Jul 2024
Fusing Audio and Metadata Embeddings Improves Language-based Audio Retrieval Paul Primus Gerhard Widmer 52 3 0 22 Jun 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition Kin Wai Lau Yasar Abbas Ur Rehman L. Po 44 1 0 21 Apr 2024
Robust Active Speaker Detection in Noisy Environments Siva Sai Nagender Vasireddy Chenxu Zhang Xiaohu Guo Yapeng Tian 40 0 0 27 Mar 2024
Tuning In: Analysis of Audio Classifier Performance in Clinical Settings with Limited Data Hamza Mahdi Eptehal Nashnoush Rami Saab Arjun Balachandar Rishit Dagli Lucas X. Perri H. Khosravani 24 1 0 07 Feb 2024
Dynamic Convolutional Neural Networks as Efficient Pre-trained Audio Models Florian Schmid Khaled Koutini Gerhard Widmer 18 11 0 24 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos? Dayoon Ko Sangho Lee Gunhee Kim 36 6 0 22 Oct 2023
CED: Consistent ensemble distillation for audio tagging Heinrich Dinkel Yongqing Wang Zhiyong Yan Junbo Zhang Yujun Wang 26 17 0 23 Aug 2023
Domain Information Control at Inference Time for Acoustic Scene Classification Shahed Masoudian Khaled Koutini Markus Schedl Gerhard Widmer Navid Rekabsaz 26 1 0 13 Jun 2023
Adapting a ConvNeXt model to audio classification on AudioSet Thomas Pellegrini Ismail Khalfaoui-Hassani Etienne Labbé T. Masquelier 6 21 0 01 Jun 2023
Streaming Audio Transformers for Online Audio Tagging Heinrich Dinkel Zhiyong Yan Yongqing Wang Junbo Zhang Yujun Wang Bin Wang 34 4 0 29 May 2023
Low-Complexity Audio Embedding Extractors Florian Schmid Khaled Koutini Gerhard Widmer 21 4 0 03 Mar 2023
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Ke Chen Xingjian Du Bilei Zhu Zejun Ma Taylor Berg-Kirkpatrick Shlomo Dubnov ViT 118 264 0 02 Feb 2022
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation Yuan Gong Yu-An Chung James R. Glass VLM 104 144 0 02 Feb 2021
MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications Andrew G. Howard Menglong Zhu Bo Chen Dmitry Kalenichenko Weijun Wang Tobias Weyand M. Andreetto Hartwig Adam 3DH 950 20,567 0 17 Apr 2017