ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.01243
  4. Cited By
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and
  Aggregation

PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation

2 February 2021
Yuan Gong
Yu-An Chung
James R. Glass
    VLM
ArXivPDFHTML

Papers citing "PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation"

29 / 29 papers shown
Title
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Fourier Head: Helping Large Language Models Learn Complex Probability Distributions
Nate Gillman
Daksh Aggarwal
Michael Freeman
Saurabh Singh
Chen Sun
AI4TS
41
3
0
29 Oct 2024
Generalization in birdsong classification: impact of transfer learning
  methods and dataset characteristics
Generalization in birdsong classification: impact of transfer learning methods and dataset characteristics
Burooj Ghani
Vincent J. Kalkman
Bob Planqué
Willem-Pier Vellinga
L. Gill
Dan Stowell
VLM
32
5
0
21 Sep 2024
Exploring Differences between Human Perception and Model Inference in
  Audio Event Recognition
Exploring Differences between Human Perception and Model Inference in Audio Event Recognition
Yizhou Tan
Yanru Wu
Yuanbo Hou
Xin Xu
Hui Bu
Shengchen Li
Dick Botteldooren
Mark D. Plumbley
33
0
0
10 Sep 2024
Audio-based Step-count Estimation for Running -- Windowing and Neural
  Network Baselines
Audio-based Step-count Estimation for Running -- Windowing and Neural Network Baselines
Philipp Wagner
Andreas Triantafyllopoulos
Alexander Gebhard
Björn Schuller
35
0
0
10 Jun 2024
AudioRepInceptionNeXt: A lightweight single-stream architecture for
  efficient audio recognition
AudioRepInceptionNeXt: A lightweight single-stream architecture for efficient audio recognition
Kin Wai Lau
Yasar Abbas Ur Rehman
L. Po
35
1
0
21 Apr 2024
Siamese Vision Transformers are Scalable Audio-visual Learners
Siamese Vision Transformers are Scalable Audio-visual Learners
Yan-Bo Lin
Gedas Bertasius
37
5
0
28 Mar 2024
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
25
17
0
27 Nov 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio
  Tagging by Aligning with Label Text Description
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
28
0
0
28 Sep 2023
Joint Audio and Speech Understanding
Joint Audio and Speech Understanding
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
28
66
0
25 Sep 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
40
1
0
14 Aug 2023
Universal Source Separation with Weakly Labelled Data
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
K. Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
MMViT: Multiscale Multiview Vision Transformers
MMViT: Multiscale Multiview Vision Transformers
Yuchen Liu
Natasha Ong
Kaiyan Peng
Bo Xiong
Qifan Wang
...
Madian Khabsa
Kaiyue Yang
David C. Liu
Donald Williamson
Hanchao Yu
ViT
22
4
0
28 Apr 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio
  Classification
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
35
22
0
19 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
24
22
0
14 Mar 2023
Low-Complexity Audio Embedding Extractors
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
21
4
0
03 Mar 2023
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge
  Distillation
Efficient Large-scale Audio Tagging via Transformer-to-CNN Knowledge Distillation
Florian Schmid
Khaled Koutini
Gerhard Widmer
ViT
20
58
0
09 Nov 2022
Play It Back: Iterative Attention for Audio Recognition
Play It Back: Iterative Attention for Audio Recognition
Alexandros Stergiou
Dima Damen
31
4
0
20 Oct 2022
Learning Temporal Resolution in Spectrogram for Audio Classification
Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu
Xubo Liu
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
34
7
0
04 Oct 2022
Contrastive Audio-Visual Masked Autoencoder
Contrastive Audio-Visual Masked Autoencoder
Yuan Gong
Andrew Rouditchenko
Alexander H. Liu
David F. Harwath
Leonid Karlinsky
Hilde Kuehne
James R. Glass
32
120
0
02 Oct 2022
UAVM: Towards Unifying Audio and Visual Models
UAVM: Towards Unifying Audio and Visual Models
Yuan Gong
Alexander H. Liu
Andrew Rouditchenko
James R. Glass
27
20
0
29 Jul 2022
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection
Segment-level Metric Learning for Few-shot Bioacoustic Event Detection
Haohe Liu
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
23
8
0
15 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
13
268
0
13 Jul 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
N. Harada
K. Kashino
SSL
36
53
0
15 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
35
39
0
06 Apr 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for
  environmental sound classification
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
27
9
0
25 Mar 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
118
264
0
02 Feb 2022
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
31
267
0
21 Oct 2021
Study of positional encoding approaches for Audio Spectrogram
  Transformers
Study of positional encoding approaches for Audio Spectrogram Transformers
L. Pepino
Pablo Riera
Luciana Ferrer
ViT
26
6
0
13 Oct 2021
There Are Many Consistent Explanations of Unlabeled Data: Why You Should
  Average
There Are Many Consistent Explanations of Unlabeled Data: Why You Should Average
Ben Athiwaratkun
Marc Finzi
Pavel Izmailov
A. Wilson
199
243
0
14 Jun 2018
1