ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1912.10211
  4. Cited By
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition

PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition

21 December 2019
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
    VLM
    SSL
ArXivPDFHTML

Papers citing "PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition"

50 / 216 papers shown
Title
Masked Spectrogram Modeling using Masked Autoencoders for Learning
  General-purpose Audio Representation
Masked Spectrogram Modeling using Masked Autoencoders for Learning General-purpose Audio Representation
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
34
67
0
26 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
41
54
0
15 Apr 2022
On the pragmatism of using binary classifiers over data intensive neural
  network classifiers for detection of COVID-19 from voice
On the pragmatism of using binary classifiers over data intensive neural network classifiers for detection of COVID-19 from voice
Ankit Parag Shah
Hira Dhamyal
Yang Gao
Daniel Arancibia
Mario Arancibia
Bhiksha Raj
Rita Singh
35
5
0
11 Apr 2022
SoundBeam: Target sound extraction conditioned on sound-class labels and
  enrollment clues for increased performance and continuous learning
SoundBeam: Target sound extraction conditioned on sound-class labels and enrollment clues for increased performance and continuous learning
Marc Delcroix
Jorge Bennasar Vázquez
Tsubasa Ochiai
K. Kinoshita
Yasunori Ohishi
S. Araki
VLM
24
32
0
08 Apr 2022
RaDur: A Reference-aware and Duration-robust Network for Target Sound
  Detection
RaDur: A Reference-aware and Duration-robust Network for Target Sound Detection
Dongchao Yang
Helin Wang
Zhongjie Ye
Yuexian Zou
Wenwu Wang
33
0
0
05 Apr 2022
A Mixed supervised Learning Framework for Target Sound Detection
A Mixed supervised Learning Framework for Target Sound Detection
Dongchao Yang
Helin Wang
Yuexian Zou
Wenwu Wang
22
0
0
05 Apr 2022
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
A Temporal-oriented Broadcast ResNet for COVID-19 Detection
Xin Jing
Shuo Liu
Emilia Parada-Cabaleiro
Andreas Triantafyllopoulos
Meishu Song
Zijiang Yang
Björn W. Schuller
45
2
0
31 Mar 2022
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
26
21
0
29 Mar 2022
Audio-text Retrieval in Context
Audio-text Retrieval in Context
Siyu Lou
Xuenan Xu
Mengyue Wu
K. Yu
22
30
0
25 Mar 2022
AudioTagging Done Right: 2nd comparison of deep learning methods for
  environmental sound classification
AudioTagging Done Right: 2nd comparison of deep learning methods for environmental sound classification
Juncheng Billy Li
Shuhui Qu
Po-Yao (Bernie) Huang
Florian Metze
VLM
41
9
0
25 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval
  and Highlight Detection
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
32
141
0
23 Mar 2022
Learning Audio Representations with MLPs
Learning Audio Representations with MLPs
Mashrur M. Morshed
Ahmad Omar Ahsan
H. Mahmud
Md. Kamrul Hasan
35
4
0
16 Mar 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
38
30
0
06 Mar 2022
Audio-Based Deep Learning Frameworks for Detecting COVID-19
Audio-Based Deep Learning Frameworks for Detecting COVID-19
Dat Ngo
L. D. Pham
Hoang Van Truong
Ş. Kolozali
D. Jarchi
53
4
0
10 Feb 2022
Maximizing Audio Event Detection Model Performance on Small Datasets
  Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation
  Study
Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study
Daniel C. Tompkins
Kshitiz Kumar
Jian Wu
22
5
0
07 Feb 2022
Learning strides in convolutional neural networks
Learning strides in convolutional neural networks
Rachid Riad
O. Teboul
David Grangier
Neil Zeghidour
39
41
0
03 Feb 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
127
264
0
02 Feb 2022
Anomalous Sound Detection using Spectral-Temporal Information Fusion
Anomalous Sound Detection using Spectral-Temporal Information Fusion
Youde Liu
Jian Guan
Qiaoxi Zhu
Wenwu Wang
28
54
0
14 Jan 2022
An Ensemble of Deep Learning Frameworks Applied For Predicting
  Respiratory Anomalies
An Ensemble of Deep Learning Frameworks Applied For Predicting Respiratory Anomalies
L. D. Pham
Dat Ngo
T. Hoang
Alexander Schindler
Ian Mcloughlin
42
5
0
09 Jan 2022
Towards Learning Universal Audio Representations
Towards Learning Universal Audio Representations
Luyu Wang
Pauline Luc
Yan Wu
Adrià Recasens
Lucas Smaira
...
Andrew Jaegle
Jean-Baptiste Alayrac
Sander Dieleman
João Carreira
Aaron van den Oord
SSL
41
68
0
23 Nov 2021
Effect of noise suppression losses on speech distortion and ASR
  performance
Effect of noise suppression losses on speech distortion and ASR performance
Sebastian Braun
H. Gamper
22
19
0
23 Nov 2021
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event
  Localization and Detection with Microphone Arrays
SALSA-Lite: A Fast and Effective Feature for Polyphonic Sound Event Localization and Detection with Microphone Arrays
Thi Ngoc Tho Nguyen
Douglas L. Jones
Karn N. Watcharasupat
Huy P Phan
W. Gan
33
36
0
16 Nov 2021
Who calls the shots? Rethinking Few-Shot Learning for Audio
Who calls the shots? Rethinking Few-Shot Learning for Audio
Yu Wang
Nicholas J. Bryan
Justin Salamon
M. Cartwright
J. P. Bello
VLM
29
25
0
18 Oct 2021
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Conformer-Based Self-Supervised Learning for Non-Speech Audio Tasks
Sangeeta Srivastava
Yun Wang
Andros Tjandra
Anurag Kumar
Chunxi Liu
Kritika Singh
Yatharth Saraf
SSL
38
24
0
14 Oct 2021
Diverse Audio Captioning via Adversarial Training
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
GAN
48
28
0
13 Oct 2021
Pano-AVQA: Grounded Audio-Visual Question Answering on 360$^\circ$
  Videos
Pano-AVQA: Grounded Audio-Visual Question Answering on 360∘^\circ∘ Videos
Heeseung Yun
Youngjae Yu
Wonsuk Yang
Kangil Lee
Gunhee Kim
30
79
0
11 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics?
Can Audio Captions Be Evaluated with Image Caption Metrics?
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
30
43
0
10 Oct 2021
A Mutual learning framework for Few-shot Sound Event Detection
A Mutual learning framework for Few-shot Sound Event Detection
Dongchao Yang
Helin Wang
Yuexian Zou
Zhongjie Ye
Wenwu Wang
34
25
0
09 Oct 2021
A study of the robustness of raw waveform based speaker embeddings under
  mismatched conditions
A study of the robustness of raw waveform based speaker embeddings under mismatched conditions
Ge Zhu
Frank Cwitkowitz
Z. Duan
24
2
0
08 Oct 2021
Fairness and underspecification in acoustic scene classification: The
  case for disaggregated evaluations
Fairness and underspecification in acoustic scene classification: The case for disaggregated evaluations
Andreas Triantafyllopoulos
M. Milling
Konstantinos Drossos
Björn W. Schuller
26
7
0
04 Oct 2021
SpliceOut: A Simple and Efficient Audio Augmentation Method
SpliceOut: A Simple and Efficient Audio Augmentation Method
Arjit Jain
Pranay Reddy Samala
Deepak Mittal
Preethi Jyothi
M. Singh
35
10
0
30 Sep 2021
Complementing Handcrafted Features with Raw Waveform Using a
  Light-weight Auxiliary Model
Complementing Handcrafted Features with Raw Waveform Using a Light-weight Auxiliary Model
Zhongwei Teng
Quchen Fu
Jules White
Maria E. Powell
Douglas C. Schmidt
28
5
0
06 Sep 2021
Parsing Birdsong with Deep Audio Embeddings
Parsing Birdsong with Deep Audio Embeddings
Irina Tolkova
Brian Chu
Marcel Hedman
Stefan Kahl
Holger Klinck
36
10
0
20 Aug 2021
Automated Audio Captioning using Transfer Learning and Reconstruction
  Latent Space Similarity Regularization
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
Andrew Koh
Fuzhao Xue
Chng Eng Siong
22
20
0
10 Aug 2021
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio
  Synthesis with GANs
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs
J. Nistal
Stefan Lattner
G. Richard
37
8
0
03 Aug 2021
Audio Captioning Transformer
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
39
77
0
21 Jul 2021
A Multimodal Machine Learning Framework for Teacher Vocal Delivery
  Evaluation
A Multimodal Machine Learning Framework for Teacher Vocal Delivery Evaluation
Hang Li
Yunxing Kang
Y. Hao
Wenbiao Ding
Zhongqin Wu
Zitao Liu
30
4
0
15 Jul 2021
Multi-modal Affect Analysis using standardized data within subjects in
  the Wild
Multi-modal Affect Analysis using standardized data within subjects in the Wild
Sachihiro Youoku
Takahisa Yamamoto
Junya Saito
A. Uchida
Xiaoyue Mi
Ziqiang Shi
Liu Liu
Zhongling Liu
Osafumi Nakayama
Kentaro Murase
CVBM
32
6
0
07 Jul 2021
Improving Sound Event Classification by Increasing Shift Invariance in
  Convolutional Neural Networks
Improving Sound Event Classification by Increasing Shift Invariance in Convolutional Neural Networks
Eduardo Fonseca
Andrés Ferraro
Xavier Serra
AI4TS
27
9
0
01 Jul 2021
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic
  Sound Event Localization and Detection
DCASE 2021 Task 3: Spectrotemporally-aligned Features for Polyphonic Sound Event Localization and Detection
Thi Ngoc Tho Nguyen
Karn N. Watcharasupat
Ngoc Khanh Nguyen
Douglas L. Jones
W. Gan
27
16
0
29 Jun 2021
Do sound event representations generalize to other audio tasks? A case
  study in audio transfer learning
Do sound event representations generalize to other audio tasks? A case study in audio transfer learning
Anurag Kumar
Yun Wang
V. Ithapu
Christian Fuegen
24
3
0
21 Jun 2021
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern
  Recognition
ERANNs: Efficient Residual Audio Neural Networks for Audio Pattern Recognition
S. Verbitskiy
Vladimir Berikov
Viacheslav Vyshegorodtsev
24
73
0
03 Jun 2021
Sound Event Detection with Adaptive Frequency Selection
Sound Event Detection with Adaptive Frequency Selection
Zhepei Wang
Jonah Casebeer
Adam Clemmitt
Efthymios Tzinis
Paris Smaragdis
27
2
0
17 May 2021
The Benefit Of Temporally-Strong Labels In Audio Event Classification
The Benefit Of Temporally-Strong Labels In Audio Event Classification
Shawn Hershey
D. Ellis
Eduardo Fonseca
A. Jansen
Caroline Liu
Channing Moore
Manoj Plakal
26
103
0
14 May 2021
Voice activity detection in the wild: A data-driven approach using
  teacher-student training
Voice activity detection in the wild: A data-driven approach using teacher-student training
Heinrich Dinkel
Shuai Wang
Xuenan Xu
Mengyue Wu
K. Yu
VLM
19
32
0
10 May 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
21
77
0
05 May 2021
Self-Supervised Learning from Automatically Separated Sound Scenes
Self-Supervised Learning from Automatically Separated Sound Scenes
Eduardo Fonseca
A. Jansen
D. Ellis
Scott Wisdom
Marco Tagliasacchi
J. Hershey
Manoj Plakal
Shawn Hershey
R. C. Moore
Xavier Serra
SSL
44
13
0
05 May 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shixing Chen
Xiaohan Nie
David D. Fan
Dongqing Zhang
Vimal Bhat
Raffay Hamid
SSL
34
62
0
28 Apr 2021
The Influence of Audio on Video Memorability with an Audio Gestalt
  Regulated Video Memorability System
The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System
Lorin Sweeney
Graham Healy
Alan F. Smeaton
39
11
0
23 Apr 2021
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Efficient and Generic 1D Dilated Convolution Layer for Deep Learning
Narendra Chaudhary
Sanchit Misra
Dhiraj D. Kalamkar
A. Heinecke
E. Georganas
Barukh Ziv
Menachem Adelman
Bharat Kaul
32
9
0
16 Apr 2021
Previous
12345
Next