Transformation of audio embeddings into interpretable, concept-based representations

18 April 2025

Papers citing "Transformation of audio embeddings into interpretable, concept-based representations"

18 / 18 papers shown

Title
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation Dennis Fucci Marco Gaido Beatrice Savoldi Matteo Negri Mauro Cettolo L. Bentivogli 246 2 0 03 Nov 2024
Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) Usha Bhalla Alexander X. Oesterling Suraj Srinivas Flavio du Pin Calmon Himabindu Lakkaraju 105 40 0 16 Feb 2024
Label-Free Concept Bottleneck Models Tuomas P. Oikarinen Subhro Das Lam M. Nguyen Tsui-Wei Weng 86 177 0 12 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research Xinhao Mei Chutong Meng Haohe Liu Qiuqiang Kong Tom Ko Chengqi Zhao Mark D. Plumbley Yuexian Zou Wenwu Wang 117 211 0 30 Mar 2023
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation Yusong Wu Kai Chen Tianyu Zhang Yuchen Hui Marianna Nezhurina Taylor Berg-Kirkpatrick Shlomo Dubnov CLIP 122 531 0 12 Nov 2022
Disentangling visual and written concepts in CLIP Joanna Materzyñska Antonio Torralba David Bau CoGe 61 51 0 15 Jun 2022
Vocalsound: A Dataset for Improving Human Vocal Sounds Recognition Yuan Gong Jingbo Yu James R. Glass 67 40 0 06 May 2022
CLIP-Dissect: Automatic Description of Neuron Representations in Deep Vision Networks Tuomas P. Oikarinen Tsui-Wei Weng VLM 50 88 1 23 Apr 2022
Listen to Interpret: Post-hoc Interpretability for Audio Networks with NMF Jayneel Parekh Sanjeel Parekh Pavlo Mozharovskyi Florence dÁlché-Buc G. Richard 52 25 0 23 Feb 2022
AudioCLIP: Extending CLIP to Image, Text and Audio A. Guzhov Federico Raue Jörn Hees Andreas Dengel CLIP VLM 119 366 0 24 Jun 2021
Learning Transferable Visual Models From Natural Language Supervision Alec Radford Jong Wook Kim Chris Hallacy Aditya A. Ramesh Gabriel Goh ... Amanda Askell Pamela Mishkin Jack Clark Gretchen Krueger Ilya Sutskever CLIP VLM 929 29,436 0 26 Feb 2021
FSD50K: An Open Dataset of Human-Labeled Sound Events Eduardo Fonseca Xavier Favory Jordi Pons F. Font Xavier Serra 77 459 0 01 Oct 2020
Concept Bottleneck Models Pang Wei Koh Thao Nguyen Y. S. Tang Stephen Mussmann Emma Pierson Been Kim Percy Liang 96 823 0 09 Jul 2020
Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors Ruihan Zhang Prashan Madumal Tim Miller Krista A. Ehinger Benjamin I. P. Rubinstein FAtt 61 104 0 27 Jun 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition Qiuqiang Kong Yin Cao Turab Iqbal Yuxuan Wang Wenwu Wang Mark D. Plumbley VLM SSL 189 1,076 0 21 Dec 2019
Clotho: An Audio Captioning Dataset Konstantinos Drossos Samuel Lipping Tuomas Virtanen 98 389 0 21 Oct 2019
European Union regulations on algorithmic decision-making and a "right to explanation" B. Goodman Seth Flaxman FaML AILaw 63 1,901 0 28 Jun 2016
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.2K 16,990 0 16 Feb 2016