PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

2 February 2021

Papers citing "PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation"

29 / 29 papers shown

Title
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions Nate Gillman Daksh Aggarwal Michael Freeman Saurabh Singh Chen Sun AI4TS 41 3 0 29 Oct 2024
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics Burooj Ghani Vincent J. Kalkman Bob Planqué Willem-Pier Vellinga L. Gill Dan Stowell VLM 32 5 0 21 Sep 2024
Exploring Differences between Human Perception and Model Inference in Audio Event Recognition Yizhou Tan Yanru Wu Yuanbo Hou Xin Xu Hui Bu Shengchen Li Dick Botteldooren Mark D. Plumbley 33 0 0 10 Sep 2024
Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines Philipp Wagner Andreas Triantafyllopoulos Alexander Gebhard Björn Schuller 35 0 0 10 Jun 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition Kin Wai Lau Yasar Abbas Ur Rehman L. Po 35 1 0 21 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners Yan-Bo Lin Gedas Bertasius 37 5 0 28 Mar 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen Zhengcong Fei Mingyuan Fan Junshi Huang 25 17 0 27 Nov 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description Youbin Jeon Yanzhen Ren VLM 28 0 0 28 Sep 2023
Joint Audio and Speech Understanding Yuan Gong Alexander H. Liu Hongyin Luo Leonid Karlinsky James R. Glass AuLLM 28 66 0 25 Sep 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes Zhaohui Li Haitao Wang Xinghua Jiang 40 1 0 14 Aug 2023
Universal Source Separation with Weakly Labelled Data Qiuqiang Kong K. Chen Haohe Liu Xingjian Du Taylor Berg-Kirkpatrick Shlomo Dubnov Mark D. Plumbley 18 17 0 11 May 2023
MMViT: Multiscale Multiview Vision Transformers Yuchen Liu Natasha Ong Kaiyan Peng Bo Xiong Qifan Wang ... Madian Khabsa Kaiyue Yang David C. Liu Donald Williamson Hanchao Yu ViT 22 4 0 28 Apr 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification Wenjie Zhu M. Omar 35 22 0 19 Mar 2023
CAT: Causal Audio Transformer for Audio Classification Xiaoyu Liu Hanlin Lu Jianbo Yuan Xinyu Li ViT 24 22 0 14 Mar 2023
Low-Complexity Audio Embedding Extractors Florian Schmid Khaled Koutini Gerhard Widmer 21 4 0 03 Mar 2023
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation Florian Schmid Khaled Koutini Gerhard Widmer ViT 20 58 0 09 Nov 2022
Play It Back: Iterative Attention for Audio Recognition Alexandros Stergiou Dima Damen 31 4 0 20 Oct 2022
Learning Temporal Resolution in Spectrogram for Audio Classification Haohe Liu Xubo Liu Qiuqiang Kong Wenwu Wang Mark D. Plumbley 34 7 0 04 Oct 2022
Contrastive Audio-Visual Masked Autoencoder Yuan Gong Andrew Rouditchenko Alexander H. Liu David F. Harwath Leonid Karlinsky Hilde Kuehne James R. Glass 32 120 0 02 Oct 2022
UAVM: Towards Unifying Audio and Visual Models Yuan Gong Alexander H. Liu Andrew Rouditchenko James R. Glass 27 20 0 29 Jul 2022
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection Haohe Liu Xubo Liu Xinhao Mei Qiuqiang Kong Wenwu Wang Mark D. Plumbley 23 8 0 15 Jul 2022
Masked Autoencoders that Listen Po-Yao (Bernie) Huang Hu Xu Juncheng Billy Li Alexei Baevski Michael Auli Wojciech Galuba Florian Metze Christoph Feichtenhofer 13 268 0 13 Jul 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations Daisuke Niizumi Daiki Takeuchi Yasunori Ohishi N. Harada K. Kashino SSL 36 53 0 15 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound Yan-Bo Lin Jie Lei Mohit Bansal Gedas Bertasius 35 39 0 06 Apr 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification Juncheng Billy Li Shuhui Qu Po-Yao (Bernie) Huang Florian Metze VLM 27 9 0 25 Mar 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection Ke Chen Xingjian Du Bilei Zhu Zejun Ma Taylor Berg-Kirkpatrick Shlomo Dubnov ViT 118 264 0 02 Feb 2022
Wav2CLIP: Learning Robust Audio Representations From CLIP Ho-Hsiang Wu Prem Seetharaman Kundan Kumar J. P. Bello CLIP VLM 31 267 0 21 Oct 2021
Study of positional encoding approaches for Audio Spectrogram Transformers L. Pepino Pablo Riera Luciana Ferrer ViT 26 6 0 13 Oct 2021
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average Ben Athiwaratkun Marc Finzi Pavel Izmailov A. Wilson 199 243 0 14 Jun 2018