Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.10211
Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"
50 / 216 papers shown
Title
Embedding Compression for Teacher-to-Student Knowledge Transfer
Yiwei Ding
Alexander Lerch
31
1
0
09 Feb 2024
Masked Audio Modeling with CLAP and Multi-Objective Learning
Yifei Xin
Xiulian Peng
Yan Lu
66
8
0
29 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
43
2
0
15 Jan 2024
DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation
Haojie Wei
Xueke Cao
Wenbo Xu
Tangpeng Dan
Yueguo Chen
VLM
27
2
0
08 Jan 2024
Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad
Romain Serizel
Nicolas Farrugia
SSL
28
9
0
25 Dec 2023
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Yuanyuan Wang
Hangting Chen
Dongchao Yang
Jianwei Yu
Chao Weng
Zhiyong Wu
Helen M. Meng
22
6
0
24 Dec 2023
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quélennec
Michel Olvera
Geoffroy Peeters
S. Essid
35
2
0
21 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
31
29
0
15 Dec 2023
Building Ears for Robots: Machine Hearing in the Age of Autonomy
Xuan Zhong
15
0
0
04 Dec 2023
Optimizing Context-Enhanced Relational Joins
Viktor Sanca
Manos Chatzakis
Anastasia Ailamaki
32
2
0
03 Dec 2023
Audio Prompt Tuning for Universal Sound Separation
Yuzhuo Liu
Xubo Liu
Yan Zhao
Yuanyuan Wang
Rui Xia
Pingchuan Tain
Yuxuan Wang
VLM
41
5
0
30 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
38
17
0
27 Nov 2023
AQUATK: An Audio Quality Assessment Toolkit
Ashvala Vinay
Alexander Lerch
21
2
0
16 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
37
9
0
25 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
35
3
0
10 Oct 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
45
0
0
28 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
46
6
0
21 Sep 2023
NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement
Wen Wang
Dongchao Yang
Qichen Ye
Bowen Cao
Yuexian Zou
DiffM
50
3
0
03 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
38
0
0
30 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
37
6
0
23 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
45
1
0
14 Aug 2023
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
38
43
0
09 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
36
13
0
08 Aug 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Yifei Xin
Yuexian Zou
52
9
0
28 Jul 2023
Complete and separate: Conditional separation with missing target source attribute completion
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
46
0
0
27 Jul 2023
Dataset balancing can hurt model performance
R. C. Moore
D. Ellis
Eduardo Fonseca
Shawn Hershey
A. Jansen
Manoj Plakal
35
9
0
30 Jun 2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Le Zhuo
Ruibin Yuan
Jiahao Pan
Yi Ma
Yizhi Li
...
Chenghua Lin
Emmanouil Benetos
Wenhu Chen
Wei Xue
Yi-Ting Guo
43
16
0
29 Jun 2023
Enhance Temporal Relations in Audio Captioning with Sound Event Detection
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
40
10
0
02 Jun 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
29
1
0
29 May 2023
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
42
4
0
29 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
41
10
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
53
116
0
18 May 2023
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
Kai Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
32
40
0
11 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
33
6
0
22 Apr 2023
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Wenke Xia
Xingjian Li
Andong Deng
Haoyi Xiong
Dejing Dou
Di Hu
27
5
0
16 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
43
2
0
12 Apr 2023
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
30
8
0
07 Apr 2023
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
26
43
0
30 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
44
22
0
19 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
29
0
0
15 Mar 2023
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li
Yao Qian
Zhuo Chen
Dongmei Wang
Takuya Yoshioka
Shujie Liu
Y. Qian
Michael Zeng
VLM
37
13
0
15 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
78
14
0
14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
33
22
0
14 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
44
2
0
05 Mar 2023
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
27
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
43
4
0
03 Mar 2023
Incremental Learning of Acoustic Scenes and Sound Events
Manjunath Mulimani
A. Mesaros
CLL
32
6
0
28 Feb 2023
Data leakage in cross-modal retrieval training: A case study
Benno Weck
Xavier Serra
33
7
0
23 Feb 2023
Previous
1
2
3
4
5
Next