ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10211
  4. Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
    VLM
    SSL
ArXivPDFHTML

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

50 / 216 papers shown
Title
Embedding Compression for Teacher-to-Student Knowledge Transfer
Embedding Compression for Teacher-to-Student Knowledge Transfer
Yiwei Ding
Alexander Lerch
31
1
0
09 Feb 2024
Masked Audio Modeling with CLAP and Multi-Objective Learning
Masked Audio Modeling with CLAP and Multi-Objective Learning
Yifei Xin
Xiulian Peng
Yan Lu
66
8
0
29 Jan 2024
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Cascaded Cross-Modal Transformer for Audio-Textual Classification
Nicolae-Cătălin Ristea
Andrei Anghel
Radu Tudor Ionescu
43
2
0
15 Jan 2024
DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal
  Pitch Estimation
DJCM: A Deep Joint Cascade Model for Singing Voice Separation and Vocal Pitch Estimation
Haojie Wei
Xueke Cao
Wenbo Xu
Tangpeng Dan
Yueguo Chen
VLM
27
2
0
08 Jan 2024
Self-Supervised Learning for Few-Shot Bird Sound Classification
Self-Supervised Learning for Few-Shot Bird Sound Classification
Ilyass Moummad
Romain Serizel
Nicolas Farrugia
SSL
28
9
0
25 Dec 2023
Consistent and Relevant: Rethink the Query Embedding in General Sound
  Separation
Consistent and Relevant: Rethink the Query Embedding in General Sound Separation
Yuanyuan Wang
Hangting Chen
Dongchao Yang
Jianwei Yu
Chao Weng
Zhiyong Wu
Helen M. Meng
22
6
0
24 Dec 2023
On the choice of the optimal temporal support for audio classification
  with Pre-trained embeddings
On the choice of the optimal temporal support for audio classification with Pre-trained embeddings
Aurian Quélennec
Michel Olvera
Geoffroy Peeters
S. Essid
35
2
0
21 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
31
29
0
15 Dec 2023
Building Ears for Robots: Machine Hearing in the Age of Autonomy
Building Ears for Robots: Machine Hearing in the Age of Autonomy
Xuan Zhong
15
0
0
04 Dec 2023
Optimizing Context-Enhanced Relational Joins
Optimizing Context-Enhanced Relational Joins
Viktor Sanca
Manos Chatzakis
Anastasia Ailamaki
32
2
0
03 Dec 2023
Audio Prompt Tuning for Universal Sound Separation
Audio Prompt Tuning for Universal Sound Separation
Yuzhuo Liu
Xubo Liu
Yan Zhao
Yuanyuan Wang
Rui Xia
Pingchuan Tain
Yuxuan Wang
VLM
41
5
0
30 Nov 2023
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
A-JEPA: Joint-Embedding Predictive Architecture Can Listen
Zhengcong Fei
Mingyuan Fan
Junshi Huang
38
17
0
27 Nov 2023
AQUATK: An Audio Quality Assessment Toolkit
AQUATK: An Audio Quality Assessment Toolkit
Ashvala Vinay
Alexander Lerch
21
2
0
16 Nov 2023
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
CAD -- Contextual Multi-modal Alignment for Dynamic AVQA
Asmar Nadeem
Adrian Hilton
R. Dawes
Graham A. Thomas
A. Mustafa
37
9
0
25 Oct 2023
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Cross-modal Cognitive Consensus guided Audio-Visual Segmentation
Zhaofeng Shi
Qingbo Wu
Fanman Meng
Linfeng Xu
Hongliang Li
VOS
35
3
0
10 Oct 2023
Semantic Proximity Alignment: Towards Human Perception-consistent Audio
  Tagging by Aligning with Label Text Description
Semantic Proximity Alignment: Towards Human Perception-consistent Audio Tagging by Aligning with Label Text Description
Youbin Jeon
Yanzhen Ren
VLM
45
0
0
28 Sep 2023
Weakly-supervised Automated Audio Captioning via text only training
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
Vassilis Katsouros
CLIP
46
6
0
21 Sep 2023
NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement
NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement
Wen Wang
Dongchao Yang
Qichen Ye
Bowen Cao
Yuexian Zou
DiffM
50
3
0
03 Sep 2023
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
AGS: An Dataset and Taxonomy for Domestic Scene Sound Event Recognition
Nan Che
Chenrui Liu
Fei Yu
38
0
0
30 Aug 2023
Audio Difference Captioning Utilizing Similarity-Discrepancy
  Disentanglement
Audio Difference Captioning Utilizing Similarity-Discrepancy Disentanglement
Daiki Takeuchi
Yasunori Ohishi
Daisuke Niizumi
Noboru Harada
K. Kashino
37
6
0
23 Aug 2023
AudioFormer: Audio Transformer learns audio feature representations from discrete acoustic codes
Zhaohui Li
Haitao Wang
Xinghua Jiang
45
1
0
14 Aug 2023
Separate Anything You Describe
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
38
43
0
09 Aug 2023
Advancing Natural-Language Based Audio Retrieval with PaSST and Large
  Audio-Caption Data Sets
Advancing Natural-Language Based Audio Retrieval with PaSST and Large Audio-Caption Data Sets
Paul Primus
Khaled Koutini
Gerhard Widmer
36
13
0
08 Aug 2023
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Yifei Xin
Yuexian Zou
52
9
0
28 Jul 2023
Complete and separate: Conditional separation with missing target source
  attribute completion
Complete and separate: Conditional separation with missing target source attribute completion
Dimitrios Bralios
Efthymios Tzinis
Paris Smaragdis
46
0
0
27 Jul 2023
Dataset balancing can hurt model performance
Dataset balancing can hurt model performance
R. C. Moore
D. Ellis
Eduardo Fonseca
Shawn Hershey
A. Jansen
Manoj Plakal
35
9
0
30 Jun 2023
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by
  Whispering to ChatGPT
LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT
Le Zhuo
Ruibin Yuan
Jiahao Pan
Yi Ma
Yizhi Li
...
Chenghua Lin
Emmanouil Benetos
Wenhu Chen
Wei Xue
Yi-Ting Guo
43
16
0
29 Jun 2023
Enhance Temporal Relations in Audio Captioning with Sound Event
  Detection
Enhance Temporal Relations in Audio Captioning with Sound Event Detection
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
40
10
0
02 Jun 2023
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes
Xilin Jiang
Yinghao Aaron Li
N. Mesgarani
CLL
29
1
0
29 May 2023
Streaming Audio Transformers for Online Audio Tagging
Streaming Audio Transformers for Online Audio Tagging
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
42
4
0
29 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
41
10
0
22 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
53
116
0
18 May 2023
Universal Source Separation with Weakly Labelled Data
Universal Source Separation with Weakly Labelled Data
Qiuqiang Kong
Kai Chen
Haohe Liu
Xingjian Du
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Mark D. Plumbley
18
17
0
11 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency
  Model
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
32
40
0
11 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
A Comparative Study of Pre-trained Speech and Audio Embeddings for
  Speech Emotion Recognition
A Comparative Study of Pre-trained Speech and Audio Embeddings for Speech Emotion Recognition
Orchid Chetia Phukan
Arun Balaji Buduru
Rajesh Sharma
33
6
0
22 Apr 2023
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Robust Cross-Modal Knowledge Distillation for Unconstrained Videos
Wenke Xia
Xingjian Li
Andong Deng
Haoyi Xiong
Dejing Dou
Di Hu
27
5
0
16 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
43
2
0
12 Apr 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
30
8
0
07 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
26
43
0
30 Mar 2023
Multiscale Audio Spectrogram Transformer for Efficient Audio
  Classification
Multiscale Audio Spectrogram Transformer for Efficient Audio Classification
Wenjie Zhu
M. Omar
44
22
0
19 Mar 2023
Enhancing Unsupervised Audio Representation Learning via Adversarial
  Sample Generation
Enhancing Unsupervised Audio Representation Learning via Adversarial Sample Generation
Yulin Pan
Xiangteng He
Biao Gong
Yuxin Peng
Yiliang Lv
SSL
29
0
0
15 Mar 2023
Target Sound Extraction with Variable Cross-modality Clues
Target Sound Extraction with Variable Cross-modality Clues
Chenda Li
Yao Qian
Zhuo Chen
Dongmei Wang
Takuya Yoshioka
Shujie Liu
Y. Qian
Michael Zeng
VLM
37
13
0
15 Mar 2023
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet
  Tag-guided Synthetic Data
BLAT: Bootstrapping Language-Audio Pre-training based on AudioSet Tag-guided Synthetic Data
Xuenan Xu
Zhiling Zhang
Zelin Zhou
Pingyue Zhang
Zeyu Xie
Mengyue Wu
Ke Zhu
CLIP
78
14
0
14 Mar 2023
CAT: Causal Audio Transformer for Audio Classification
CAT: Causal Audio Transformer for Audio Classification
Xiaoyu Liu
Hanlin Lu
Jianbo Yuan
Xinyu Li
ViT
33
22
0
14 Mar 2023
Heterogeneous Graph Learning for Acoustic Event Classification
Heterogeneous Graph Learning for Acoustic Event Classification
A. Shirian
Mona Ahmadian
Krishna Somandepalli
T. Guha
44
2
0
05 Mar 2023
Low-Complexity Audio Embedding Extractors
Low-Complexity Audio Embedding Extractors
Florian Schmid
Khaled Koutini
Gerhard Widmer
27
4
0
03 Mar 2023
Unified Keyword Spotting and Audio Tagging on Mobile Devices with
  Transformers
Unified Keyword Spotting and Audio Tagging on Mobile Devices with Transformers
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Junbo Zhang
Yujun Wang
43
4
0
03 Mar 2023
Incremental Learning of Acoustic Scenes and Sound Events
Incremental Learning of Acoustic Scenes and Sound Events
Manjunath Mulimani
A. Mesaros
CLL
32
6
0
28 Feb 2023
Data leakage in cross-modal retrieval training: A case study
Data leakage in cross-modal retrieval training: A case study
Benno Weck
Xavier Serra
33
7
0
23 Feb 2023
Previous
12345
Next