ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.09430
  4. Cited By
CNN Architectures for Large-Scale Audio Classification

CNN Architectures for Large-Scale Audio Classification

29 September 2016
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
R. C. Moore
Manoj Plakal
D. Platt
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
ArXivPDFHTML

Papers citing "CNN Architectures for Large-Scale Audio Classification"

50 / 336 papers shown
Title
Continuous Emotion Recognition using Visual-audio-linguistic
  information: A Technical Report for ABAW3
Continuous Emotion Recognition using Visual-audio-linguistic information: A Technical Report for ABAW3
Su Zhang
Ruyi An
Yi Ding
Cuntai Guan
19
28
0
24 Mar 2022
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One
  More Step Towards Generalization
MDMMT-2: Multidomain Multimodal Transformer for Video Retrieval, One More Step Towards Generalization
Alexander Kunitsyn
M. Kalashnikov
Maksim Dzabraev
Andrei Ivaniuta
30
16
0
14 Mar 2022
Comparison of Spatio-Temporal Models for Human Motion and Pose
  Forecasting in Face-to-Face Interaction Scenarios
Comparison of Spatio-Temporal Models for Human Motion and Pose Forecasting in Face-to-Face Interaction Scenarios
Germán Barquero
Johnny Núnez
Zhen Xu
Sergio Escalera
Wei-Wei Tu
Isabelle M Guyon
Cristina Palmero
CVBM
45
12
0
07 Mar 2022
TRILLsson: Distilled Universal Paralinguistic Speech Representations
TRILLsson: Distilled Universal Paralinguistic Speech Representations
Joel Shor
Subhashini Venugopalan
25
37
0
01 Mar 2022
Multi-view and Multi-modal Event Detection Utilizing Transformer-based
  Multi-sensor fusion
Multi-view and Multi-modal Event Detection Utilizing Transformer-based Multi-sensor fusion
Masahiro Yasuda
Yasunori Ohishi
Shoichiro Saito
N. Harada
38
13
0
18 Feb 2022
ADIMA: Abuse Detection In Multilingual Audio
ADIMA: Abuse Detection In Multilingual Audio
Vikram Gupta
Rini A. Sharon
Ramit Sawhney
Debdoot Mukherjee
21
19
0
16 Feb 2022
Maximizing Audio Event Detection Model Performance on Small Datasets
  Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation
  Study
Maximizing Audio Event Detection Model Performance on Small Datasets Through Knowledge Transfer, Data Augmentation, And Pretraining: An Ablation Study
Daniel C. Tompkins
Kshitiz Kumar
Jian Wu
17
5
0
07 Feb 2022
Prediction of Neonatal Respiratory Distress in Term Babies at Birth from
  Digital Stethoscope Recorded Chest Sounds
Prediction of Neonatal Respiratory Distress in Term Babies at Birth from Digital Stethoscope Recorded Chest Sounds
Ethan Grooby
C. Sitaula
K. Tan
Lindsay Zhou
Arrabella King
Ashwin Ramanathan
A. Malhotra
G. Dumont
F. Marzbanrad
10
4
0
25 Jan 2022
Action Keypoint Network for Efficient Video Recognition
Action Keypoint Network for Efficient Video Recognition
Xu Chen
Yahong Han
Xiaohan Wang
Yifang Sun
Yi Yang
3DPC
27
6
0
17 Jan 2022
Continual Transformers: Redundancy-Free Attention for Online Inference
Continual Transformers: Redundancy-Free Attention for Online Inference
Lukas Hedegaard
Arian Bakhtiarnia
Alexandros Iosifidis
CLL
27
11
0
17 Jan 2022
Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and
  Binary Neural Networks
Sub-mW Keyword Spotting on an MCU: Analog Binary Feature Extraction and Binary Neural Networks
G. Cerutti
Lukas Cavigelli
Renzo Andri
Michele Magno
Elisabetta Farella
Luca Benini
26
14
0
10 Jan 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
33
6
0
04 Jan 2022
Towards Relatable Explainable AI with the Perceptual Process
Towards Relatable Explainable AI with the Perceptual Process
Wencan Zhang
Brian Y. Lim
AAML
XAI
25
62
0
28 Dec 2021
Cross Modal Retrieval with Querybank Normalisation
Cross Modal Retrieval with Querybank Normalisation
Simion-Vlad Bogolin
Ioana Croitoru
Hailin Jin
Yang Liu
Samuel Albanie
27
84
0
23 Dec 2021
Multimodal Personality Recognition using Cross-Attention Transformer and
  Behaviour Encoding
Multimodal Personality Recognition using Cross-Attention Transformer and Behaviour Encoding
Tanay Agrawal
Dhruv Agarwal
Michal Balazia
Neelabh Sinha
F. Brémond
ViT
17
14
0
22 Dec 2021
Tell me what you see: A zero-shot action recognition method based on
  natural language descriptions
Tell me what you see: A zero-shot action recognition method based on natural language descriptions
Valter Estevam
Rayson Laroca
David Menotti
Hélio Pedrini
33
13
0
18 Dec 2021
Benchmarking Uncertainty Quantification on Biosignal Classification
  Tasks under Dataset Shift
Benchmarking Uncertainty Quantification on Biosignal Classification Tasks under Dataset Shift
Tong Xia
Jing Han
Cecilia Mascolo
OOD
24
10
0
16 Dec 2021
Computational bioacoustics with deep learning: a review and roadmap
Computational bioacoustics with deep learning: a review and roadmap
D. Stowell
32
235
0
13 Dec 2021
Overview of The MediaEval 2021 Predicting Media Memorability Task
Overview of The MediaEval 2021 Predicting Media Memorability Task
R. Kiziltepe
M. Constantin
C. Demarty
Graham Healy
Camilo Luciano Fosco
...
S. Halder
Bogdan Ionescu
A. Matran-Fernandez
Alan F. Smeaton
Lorin Sweeney
21
13
0
11 Dec 2021
VocBench: A Neural Vocoder Benchmark for Speech Synthesis
VocBench: A Neural Vocoder Benchmark for Speech Synthesis
Ehab A. AlBadawy
Andrew Gibiansky
Qing He
Jilong Wu
Ming-Ching Chang
Siwei Lyu
22
12
0
06 Dec 2021
Sound-Guided Semantic Image Manipulation
Sound-Guided Semantic Image Manipulation
Seung Hyun Lee
Wonseok Roh
Wonmin Byeon
Sang Ho Yoon
Chanyoung Kim
Jinkyu Kim
Sangpil Kim
DiffM
27
43
0
30 Nov 2021
SP-SEDT: Self-supervised Pre-training for Sound Event Detection
  Transformer
SP-SEDT: Self-supervised Pre-training for Sound Event Detection Transformer
Zhi-qin Ye
Xiangdong Wang
Hong Liu
Yueliang Qian
Ruijie Tao
Long Yan
Kazushige Ouchi
ViT
24
2
0
30 Nov 2021
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does
  Matter
CLIP Meets Video Captioning: Concept-Aware Representation Learning Does Matter
Bang-ju Yang
Tong Zhang
Yuexian Zou
CLIP
25
20
0
30 Nov 2021
Masking Modalities for Cross-modal Video Retrieval
Masking Modalities for Cross-modal Video Retrieval
Valentin Gabeur
Arsha Nagrani
Chen Sun
Alahari Karteek
Cordelia Schmid
19
29
0
01 Nov 2021
EfficientWord-Net: An Open Source Hotword Detection Engine based on
  One-shot Learning
EfficientWord-Net: An Open Source Hotword Detection Engine based on One-shot Learning
R. Chidhambararajan
Aman Rangaur
S. C. Sethuraman
14
4
0
31 Oct 2021
Physics-informed linear regression is competitive with two Machine
  Learning methods in residential building MPC
Physics-informed linear regression is competitive with two Machine Learning methods in residential building MPC
Felix Bünning
B. Huber
Adrian Schalbetter
Ahmed Aboudonia
Mathias Hudoba de Badyn
Philipp Heer
Roy S. Smith
John Lygeros
AI4CE
22
65
0
29 Oct 2021
Detecting Dementia from Speech and Transcripts using Transformers
Detecting Dementia from Speech and Transcripts using Transformers
Loukas Ilias
D. Askounis
J. Psarras
16
32
0
27 Oct 2021
TriBERT: Full-body Human-centric Audio-visual Representation Learning
  for Visual Sound Separation
TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation
Tanzila Rahman
Mengyu Yang
Leonid Sigal
ViT
29
8
0
26 Oct 2021
DECAR: Deep Clustering for learning general-purpose Audio
  Representations
DECAR: Deep Clustering for learning general-purpose Audio Representations
Sreyan Ghosh
Sandesh V Katta
Ashish Seth
S. Umesh
SSL
36
12
0
17 Oct 2021
Taming Visually Guided Sound Generation
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
32
122
0
17 Oct 2021
Rank-based loss for learning hierarchical representations
Rank-based loss for learning hierarchical representations
I. Nolasco
D. Stowell
21
8
0
11 Oct 2021
Universal Paralinguistic Speech Representations Using Self-Supervised
  Conformers
Universal Paralinguistic Speech Representations Using Self-Supervised Conformers
Joel Shor
A. Jansen
Wei Han
Daniel S. Park
Yu Zhang
SSL
AI4TS
43
54
0
09 Oct 2021
Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in
  Speech Enhancement
Aura: Privacy-preserving Augmentation to Improve Test Set Diversity in Speech Enhancement
Xavier Gitiaux
Aditya Khant
Ebrahim Beyrami
Chandan K. A. Reddy
J. Gupchup
Ross Cutler
22
0
0
08 Oct 2021
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex
  Convolutions
PHNNs: Lightweight Neural Networks via Parameterized Hypercomplex Convolutions
Eleonora Grassucci
Aston Zhang
Danilo Comminiello
28
38
0
08 Oct 2021
SERAB: A multi-lingual benchmark for speech emotion recognition
SERAB: A multi-lingual benchmark for speech emotion recognition
Neil Scheidwasser
M. Kegler
P. Beckmann
Milos Cernak
32
44
0
07 Oct 2021
Attention is All You Need? Good Embeddings with Statistics are
  enough:Large Scale Audio Understanding without Transformers/ Convolutions/
  BERTs/ Mixers/ Attention/ RNNs or ....
Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Prateek Verma
AI4TS
32
2
0
07 Oct 2021
Sound Event Detection Transformer: An Event-based End-to-End Model for
  Sound Event Detection
Sound Event Detection Transformer: An Event-based End-to-End Model for Sound Event Detection
Zhi-qin Ye
Xiangdong Wang
Hong Liu
Yueliang Qian
Ruijie Tao
Long Yan
Kazushige Ouchi
ViT
35
15
0
05 Oct 2021
Procedure Planning in Instructional Videos via Contextual Modeling and
  Model-based Policy Learning
Procedure Planning in Instructional Videos via Contextual Modeling and Model-based Policy Learning
Jing Bi
Jiebo Luo
Chenliang Xu
76
48
0
05 Oct 2021
Hierarchical Multimodal Transformer to Summarize Videos
Hierarchical Multimodal Transformer to Summarize Videos
Bin Zhao
Maoguo Gong
Xuelong Li
ViT
30
55
0
22 Sep 2021
Audio Interval Retrieval using Convolutional Neural Networks
Audio Interval Retrieval using Convolutional Neural Networks
I. Kuzminykh
Dan Shevchuk
S. Shiaeles
Bogdan Ghita
23
7
0
21 Sep 2021
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic
  Interactions
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
D. Curto
Albert Clapés
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
...
G. Guilera
D. Leiva
T. Moeslund
Sergio Escalera
Cristina Palmero
46
29
0
20 Sep 2021
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent
  Adversarial Networks
Timbre Transfer with Variational Auto Encoding and Cycle-Consistent Adversarial Networks
Russell Sammut Bonnici
C. Saitis
Martin Benning
GAN
30
15
0
05 Sep 2021
Audio-Visual Transformer Based Crowd Counting
Audio-Visual Transformer Based Crowd Counting
Usman Sajid
Xiangyu Chen
Hasan Sajid
Taejoon Kim
Guanghui Wang
ViT
43
22
0
04 Sep 2021
Multi-modal Representation Learning for Video Advertisement Content
  Structuring
Multi-modal Representation Learning for Video Advertisement Content Structuring
Daya Guo
Zhaoyang Zeng
27
4
0
04 Sep 2021
EarGate: Gait-based User Identification with In-ear Microphones
EarGate: Gait-based User Identification with In-ear Microphones
Andrea Ferlini
Dong Ma
R. Harle
Cecilia Mascolo
13
71
0
27 Aug 2021
Parsing Birdsong with Deep Audio Embeddings
Parsing Birdsong with Deep Audio Embeddings
Irina Tolkova
Brian Chu
Marcel Hedman
Stefan Kahl
Holger Klinck
36
10
0
20 Aug 2021
Mounting Video Metadata on Transformer-based Language Model for
  Open-ended Video Question Answering
Mounting Video Metadata on Transformer-based Language Model for Open-ended Video Question Answering
Donggeon Lee
Seongho Choi
Youwon Jang
Byoung-Tak Zhang
16
2
0
11 Aug 2021
Optimizing Latency for Online Video CaptioningUsing Audio-Visual
  Transformers
Optimizing Latency for Online Video CaptioningUsing Audio-Visual Transformers
Chiori Hori
Takaaki Hori
Jonathan Le Roux
25
4
0
04 Aug 2021
Improving Music Performance Assessment with Contrastive Learning
Improving Music Performance Assessment with Contrastive Learning
Pavan Seshadri
Alexander Lerch
19
8
0
03 Aug 2021
Multimodal Feature Fusion for Video Advertisements Tagging Via Stacking
  Ensemble
Multimodal Feature Fusion for Video Advertisements Tagging Via Stacking Ensemble
Qingsong Zhou
Hai Liang
Zhimin Lin
Kele Xu
37
5
0
02 Aug 2021
Previous
1234567
Next