ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.09430
  4. Cited By
CNN Architectures for Large-Scale Audio Classification

CNN Architectures for Large-Scale Audio Classification

29 September 2016
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
R. C. Moore
Manoj Plakal
D. Platt
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
ArXivPDFHTML

Papers citing "CNN Architectures for Large-Scale Audio Classification"

50 / 336 papers shown
Title
The CORSMAL benchmark for the prediction of the properties of containers
The CORSMAL benchmark for the prediction of the properties of containers
Alessio Xompero
Santiago Donaher
Vladimir E. Iashin
Francesca Palermo
Gokhan Solak
...
G. Neeharika
Chinnakotla Krishna Teja Reddy
Dinesh Jain
B. Rehman
Andrea Cavallaro
30
10
0
27 Jul 2021
PERSA+: A Deep Learning Front-End for Context-Agnostic Audio
  Classification
PERSA+: A Deep Learning Front-End for Context-Agnostic Audio Classification
Lazaros Vrysis
Iordanis Thoidis
Charalampos A. Dimoulas
G. Papanikolaou
VLM
33
0
0
20 Jul 2021
Project Achoo: A Practical Model and Application for COVID-19 Detection
  from Recordings of Breath, Voice, and Cough
Project Achoo: A Practical Model and Application for COVID-19 Detection from Recordings of Breath, Voice, and Cough
Alexander Ponomarchuk
I. Burenko
Elian Malkin
Ivan Nazarov
V. Kokh
Manvel Avetisian
L. Zhukov
39
40
0
12 Jul 2021
Neural Waveshaping Synthesis
Neural Waveshaping Synthesis
B. Hayes
C. Saitis
Gyorgy Fazekas
36
28
0
11 Jul 2021
Comparing Supervised Models And Learned Speech Representations For
  Classifying Intelligibility Of Disordered Speech On Selected Phrases
Comparing Supervised Models And Learned Speech Representations For Classifying Intelligibility Of Disordered Speech On Selected Phrases
Subhashini Venugopalan
Joel Shor
Manoj Plakal
Jimmy Tobin
Katrin Tomanek
Jordan R. Green
Michael P. Brenner
27
12
0
08 Jul 2021
Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model
Audio-Oriented Multimodal Machine Comprehension: Task, Dataset and Model
Zhiqi Huang
Fenglin Liu
Xian Wu
Shen Ge
Helin Wang
Wei Fan
Yuexian Zou
AuLLM
29
2
0
04 Jul 2021
Continuous Emotion Recognition with Audio-visual Leader-follower
  Attentive Fusion
Continuous Emotion Recognition with Audio-visual Leader-follower Attentive Fusion
Su Zhang
Yi Ding
Ziquan Wei
Cuntai Guan
40
25
0
02 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
25
543
0
30 Jun 2021
Towards sound based testing of COVID-19 -- Summary of the first
  Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge
Towards sound based testing of COVID-19 -- Summary of the first Diagnostics of COVID-19 using Acoustics (DiCOVA) Challenge
N. Sharma
Ananya Muguli
Prashant Krishnan
Rohit Kumar
Srikanth Raj Chetupalli
Sriram Ganapathy
33
13
0
21 Jun 2021
Zero-Shot Federated Learning with New Classes for Audio Classification
Zero-Shot Federated Learning with New Classes for Audio Classification
Gautham Krishna Gudur
S. K. Perepu
FedML
13
10
0
18 Jun 2021
Voice2Series: Reprogramming Acoustic Models for Time Series
  Classification
Voice2Series: Reprogramming Acoustic Models for Time Series Classification
Chao-Han Huck Yang
Yun-Yun Tsai
Pin-Yu Chen
AI4TS
29
122
0
17 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
27
11
0
12 Jun 2021
Impact of data-splits on generalization: Identifying COVID-19 from cough
  and context
Impact of data-splits on generalization: Identifying COVID-19 from cough and context
Makkunda Sharma
Nikhil Shenoy
Jigar Doshi
Piyush Bagad
Aman Dalmia
Parag Bhamare
A. Mahale
S. Rane
Neeraj Agrawal
R. Panicker
OOD
50
4
0
05 Jun 2021
Receptive Field Regularization Techniques for Audio Classification and
  Tagging with Deep Convolutional Neural Networks
Receptive Field Regularization Techniques for Audio Classification and Tagging with Deep Convolutional Neural Networks
Khaled Koutini
Hamid Eghbalzadeh
Gerhard Widmer
30
46
0
26 May 2021
Social Behaviour Understanding using Deep Neural Networks: Development
  of Social Intelligence Systems
Social Behaviour Understanding using Deep Neural Networks: Development of Social Intelligence Systems
Ethan Lim Ding Feng
Zhi-Wei Neo
Aaron William De Silva
Kellie Sim
Hong-Ray Tan
T. Nguyen
K. Koh
Wenru Wang
Hoang D. Nguyen
17
2
0
20 May 2021
Single-Layer Vision Transformers for More Accurate Early Exits with Less
  Overhead
Single-Layer Vision Transformers for More Accurate Early Exits with Less Overhead
Arian Bakhtiarnia
Qi Zhang
Alexandros Iosifidis
27
35
0
19 May 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
21
77
0
05 May 2021
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shot Contrastive Self-Supervised Learning for Scene Boundary Detection
Shixing Chen
Xiaohan Nie
David D. Fan
Dongqing Zhang
Vimal Bhat
Raffay Hamid
SSL
27
62
0
28 Apr 2021
The Influence of Audio on Video Memorability with an Audio Gestalt
  Regulated Video Memorability System
The Influence of Audio on Video Memorability with an Audio Gestalt Regulated Video Memorability System
Lorin Sweeney
Graham Healy
Alan F. Smeaton
19
11
0
23 Apr 2021
Room adaptive conditioning method for sound event classification in
  reverberant environments
Room adaptive conditioning method for sound event classification in reverberant environments
Jaejun Lee
Donmoon Lee
Hyeong-Seok Choi
Kyogu Lee
23
2
0
21 Apr 2021
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
T2VLAD: Global-Local Sequence Alignment for Text-Video Retrieval
Xiaohan Wang
Linchao Zhu
Yi Yang
170
170
0
20 Apr 2021
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language
  Tasks
VGNMN: Video-grounded Neural Module Network to Video-Grounded Language Tasks
Hung Le
Nancy F. Chen
Guosheng Lin
MLLM
26
19
0
16 Apr 2021
Comparison and Analysis of Deep Audio Embeddings for Music Emotion
  Recognition
Comparison and Analysis of Deep Audio Embeddings for Music Emotion Recognition
E. Koh
Shlomo Dubnov
29
38
0
13 Apr 2021
Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data
Uncertainty-Aware COVID-19 Detection from Imbalanced Sound Data
Tong Xia
Jing Han
Lorena Qendro
T. Dang
Cecilia Mascolo
29
25
0
05 Apr 2021
SubSpectral Normalization for Neural Audio Data Processing
SubSpectral Normalization for Neural Audio Data Processing
Simyung Chang
Hyoungwoo Park
Janghoon Cho
Hyunsin Park
Sungrack Yun
Kyuwoong Hwang
23
30
0
25 Mar 2021
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
MDMMT: Multidomain Multimodal Transformer for Video Retrieval
Maksim Dzabraev
M. Kalashnikov
Stepan Alekseevich Komkov
Aleksandr Petiushko
24
128
0
19 Mar 2021
Slow-Fast Auditory Streams For Audio Recognition
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
16
66
0
05 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection
  and Tracking with Sound by Distilling Multimodal Knowledge
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
Multi-modal Ensemble Models for Predicting Video Memorability
Multi-modal Ensemble Models for Predicting Video Memorability
Tony Zhao
Irving Fang
Jeffrey Kim
Gerald Friedland
19
5
0
01 Feb 2021
A Case Study of Deep Learning Based Multi-Modal Methods for Predicting
  the Age-Suitability Rating of Movie Trailers
A Case Study of Deep Learning Based Multi-Modal Methods for Predicting the Age-Suitability Rating of Movie Trailers
Mahsa Shafaei
C. Smailis
I. Kakadiaris
Thamar Solorio
141
1
0
26 Jan 2021
LEAF: A Learnable Frontend for Audio Classification
LEAF: A Learnable Frontend for Audio Classification
Neil Zeghidour
O. Teboul
Félix de Chaumont Quitry
Marco Tagliasacchi
VLM
AAML
85
144
0
21 Jan 2021
The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset:
  Collection, Insights and Improvements
The Multimodal Sentiment Analysis in Car Reviews (MuSe-CaR) Dataset: Collection, Insights and Improvements
Lukas Stappen
Alice Baird
Lea Schumann
Björn Schuller
42
59
0
15 Jan 2021
Sound Event Detection with Binary Neural Networks on Tightly
  Power-Constrained IoT Devices
Sound Event Detection with Binary Neural Networks on Tightly Power-Constrained IoT Devices
G. Cerutti
Renzo Andri
Lukas Cavigelli
Michele Magno
Elisabetta Farella
Luca Benini
MQ
21
37
0
12 Jan 2021
Environment Transfer for Distributed Systems
Environment Transfer for Distributed Systems
Chunheng Jiang
Jae-wook Ahn
N. Desai
28
1
0
06 Jan 2021
Context-Aware Personality Inference in Dyadic Scenarios: Introducing the
  UDIVA Dataset
Context-Aware Personality Inference in Dyadic Scenarios: Introducing the UDIVA Dataset
Cristina Palmero
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
Albert Clapés
...
Zejian Zhang
D. Gallardo-Pujol
G. Guilera
D. Leiva
Sergio Escalera
28
53
0
28 Dec 2020
Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action
  Recognition
Skeleton-DML: Deep Metric Learning for Skeleton-Based One-Shot Action Recognition
Raphael Memmesheimer
Simon Häring
Nick Theisen
Dietrich Paulus
35
36
0
26 Dec 2020
Analysis of Feature Representations for Anomalous Sound Detection
Analysis of Feature Representations for Anomalous Sound Detection
Robert Muller
Steffen Illium
Fabian Ritz
Kyrill Schmid
16
18
0
11 Dec 2020
Multi-Modal Detection of Alzheimer's Disease from Speech and Text
Multi-Modal Detection of Alzheimer's Disease from Speech and Text
Amish Mittal
Sourav Sahoo
Arnhav Datar
Juned Kadiwala
H. Shalu
Jimson Mathew
12
20
0
30 Nov 2020
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of
  Broadcast Soccer Videos
SoccerNet-v2: A Dataset and Benchmarks for Holistic Understanding of Broadcast Soccer Videos
A. Deliège
A. Cioppa
Silvio Giancola
M. J. Seikavandi
J. Dueholm
Kamal Nasrollahi
Guohao Li
T. Moeslund
Marc Van Droogenbroeck
18
152
0
26 Nov 2020
Virufy: Global Applicability of Crowdsourced and Clinical Datasets for
  AI Detection of COVID-19 from Cough
Virufy: Global Applicability of Crowdsourced and Clinical Datasets for AI Detection of COVID-19 from Cough
Gunvant R. Chaudhari
Xinyi Jiang
Ahmed E. Fakhry
Asriel Han
Jaclyn Xiao
Sabrina Shen
Amil Khanzada
21
92
0
26 Nov 2020
Learning to dance: A graph convolutional adversarial network to generate
  realistic dance motions from audio
Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio
João P. Ferreira
Thiago M. Coutinho
Thiago L. Gomes
J. F. Neto
Rafael Azevedo
Renato Martins
Erickson R. Nascimento
GAN
36
68
0
25 Nov 2020
TMT: A Transformer-based Modal Translator for Improving Multimodal
  Sequence Representations in Audio Visual Scene-aware Dialog
TMT: A Transformer-based Modal Translator for Improving Multimodal Sequence Representations in Audio Visual Scene-aware Dialog
Wubo Li
Dongwei Jiang
Wei Zou
Xiangang Li
23
6
0
21 Oct 2020
Real-time Speech Frequency Bandwidth Extension
Real-time Speech Frequency Bandwidth Extension
Yunpeng Li
Marco Tagliasacchi
Oleg Rybakov
Victor Ungureanu
Dominik Roblek
17
47
0
21 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded
  Dialogues
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues
Hung Le
Doyen Sahoo
Nancy F. Chen
Guosheng Lin
44
30
0
20 Oct 2020
CLAR: Contrastive Learning of Auditory Representations
CLAR: Contrastive Learning of Auditory Representations
Haider Al-Tahan
Y. Mohsenzadeh
SSL
118
56
0
19 Oct 2020
Joint Analysis of Sound Events and Acoustic Scenes Using Multitask
  Learning
Joint Analysis of Sound Events and Acoustic Scenes Using Multitask Learning
Noriyuki Tonami
Keisuke Imoto
Ryosuke Yamanishi
Y. Yamashita
23
13
0
16 Oct 2020
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity
  Detection, Video Captioning and Matching, and Video Search & Retrieval
TRECVID 2019: An Evaluation Campaign to Benchmark Video Activity Detection, Video Captioning and Matching, and Video Search & Retrieval
G. Awad
A. Butt
Keith Curtis
Yooyoung Lee
Jonathan G. Fiscus
...
Lukas L. Diduch
Alan F. Smeaton
Yyette Graham
Wessel Kraaij
Georges Quénot
20
70
0
21 Sep 2020
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Cough Against COVID: Evidence of COVID-19 Signature in Cough Sounds
Piyush Bagad
Aman Dalmia
Jigar Doshi
Arsha Nagrani
Parag Bhamare
A. Mahale
S. Rane
N. Agarwal
R. Panicker
34
112
0
17 Sep 2020
Exploiting Multi-Modal Features From Pre-trained Networks for
  Alzheimer's Dementia Recognition
Exploiting Multi-Modal Features From Pre-trained Networks for Alzheimer's Dementia Recognition
Junghyun Koo
Jie Hwan Lee
Jaewoo Pyo
Yujin Jo
Kyogu Lee
27
58
0
09 Sep 2020
CRNNs for Urban Sound Tagging with spatiotemporal context
CRNNs for Urban Sound Tagging with spatiotemporal context
Augustin Arnault
Nicolas Riche
25
7
0
24 Aug 2020
Previous
1234567
Next