ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1609.09430
  4. Cited By
CNN Architectures for Large-Scale Audio Classification

CNN Architectures for Large-Scale Audio Classification

29 September 2016
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
R. C. Moore
Manoj Plakal
D. Platt
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
ArXivPDFHTML

Papers citing "CNN Architectures for Large-Scale Audio Classification"

36 / 336 papers shown
Title
Sound Event Detection Using Graph Laplacian Regularization Based on
  Event Co-occurrence
Sound Event Detection Using Graph Laplacian Regularization Based on Event Co-occurrence
Keisuke Imoto
Seisuke Kyochi
9
12
0
02 Feb 2019
Enhancing Sound Texture in CNN-Based Acoustic Scene Classification
Enhancing Sound Texture in CNN-Based Acoustic Scene Classification
Yuzhong Wu
Tan Lee
12
39
0
06 Jan 2019
From FiLM to Video: Multi-turn Question Answering with Multi-modal
  Context
From FiLM to Video: Multi-turn Question Answering with Multi-modal Context
T. Nguyen
Shikhar Sharma
Hannes Schulz
Layla El Asri
15
33
0
17 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Learning to match transient sound events using attentional similarity
  for few-shot sound recognition
Learning to match transient sound events using attentional similarity for few-shot sound recognition
Szu-Yu Chou
Kai-Hsiang Cheng
J. Jang
Yi-Hsuan Yang
21
59
0
04 Dec 2018
SwishNet: A Fast Convolutional Neural Network for Speech, Music and
  Noise Classification and Segmentation
SwishNet: A Fast Convolutional Neural Network for Speech, Music and Noise Classification and Segmentation
Md Shamim Hussain
M. A. Haque
23
48
0
01 Dec 2018
Learning Sound Events From Webly Labeled Data
Learning Sound Events From Webly Labeled Data
Anurag Kumar
Ankit Parag Shah
Bhiksha Raj
Alexander G. Hauptmann
NoLa
29
12
0
25 Nov 2018
General audio tagging with ensembling convolutional neural network and
  statistical features
General audio tagging with ensembling convolutional neural network and statistical features
Kele Xu
Boqing Zhu
Qiuqiang Kong
Haibo Mi
Bo Ding
Dezhi Wang
Huaimin Wang
22
30
0
30 Oct 2018
Training neural audio classifiers with few data
Training neural audio classifiers with few data
Jordi Pons
Joan Serrà
Xavier Serra
16
57
0
24 Oct 2018
Audio-Based Activities of Daily Living (ADL) Recognition with
  Large-Scale Acoustic Embeddings from Online Videos
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
Dawei Liang
Edison Thomaz
12
80
0
19 Oct 2018
Towards Good Practices for Multi-modal Fusion in Large-scale Video
  Classification
Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification
Jinlai Liu
Zehuan Yuan
Changhu Wang
24
9
0
16 Sep 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
24
171
0
07 Sep 2018
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary
Guohao Li
Juan Carlos Niebles
Cees G. M. Snoek
Fabian Caba Heilbron
Humam Alwassel
Victor Escorcia
Ranjay Krishna
S. Buch
Cuong Duc Dao
42
65
0
11 Aug 2018
RUC+CMU: System Report for Dense Captioning Events in Videos
RUC+CMU: System Report for Dense Captioning Events in Videos
Shizhe Chen
Yuqing Song
Yida Zhao
Jiarong Qiu
Qin Jin
Alexander G. Hauptmann
16
7
0
22 Jun 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
G. Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
18
125
0
21 Jun 2018
Mining for meaning: from vision to language through multiple networks
  consensus
Mining for meaning: from vision to language through multiple networks consensus
Iulia Duta
Andrei Liviu Nicolicioiu
Simion-Vlad Bogolin
Marius Leordeanu
18
3
0
05 Jun 2018
Adaptive pooling operators for weakly labeled sound event detection
Adaptive pooling operators for weakly labeled sound event detection
Brian McFee
Justin Salamon
J. P. Bello
22
148
0
26 Apr 2018
A Closer Look at Weak Label Learning for Audio Events
A Closer Look at Weak Label Learning for Audio Events
Ankit Parag Shah
Anurag Kumar
Alexander G. Hauptmann
Bhiksha Raj
6
64
0
24 Apr 2018
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal
  Attentions for Video Captioning
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning
Qing Guo
Yuan-fang Wang
William Yang Wang
13
76
0
15 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
529
0
09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
36
425
0
23 Mar 2018
Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds)
Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds)
F. Groh
P. Wieschollek
Hendrik P. A. Lensch
3DPC
16
107
0
20 Mar 2018
Neural Predictive Coding using Convolutional Neural Networks towards
  Unsupervised Learning of Speaker Characteristics
Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics
Arindam Jati
P. Georgiou
SSL
13
48
0
22 Feb 2018
Adversarial Audio Synthesis
Adversarial Audio Synthesis
Chris Donahue
Julian McAuley
M. Puckette
GAN
27
602
0
12 Feb 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
47
538
0
09 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval
Cross-modal Embeddings for Video and Audio Retrieval
Dídac Surís
A. Duarte
Amaia Salvador
Jordi Torres
Xavier Giró-i-Nieto
SSL
18
69
0
07 Jan 2018
Improved Inception-Residual Convolutional Neural Network for Object
  Recognition
Improved Inception-Residual Convolutional Neural Network for Object Recognition
Md. Zahangir Alom
Mahmudul Hasan
C. Yakopcic
T. Taha
V. Asari
46
116
0
28 Dec 2017
Classification vs. Regression in Supervised Learning for Single Channel
  Speaker Count Estimation
Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation
Fabian-Robert Stöter
Soumitro Chakrabarty
B. Edler
Emanuel Habets
BDL
29
37
0
12 Dec 2017
Attention Clusters: Purely Attention Based Local Feature Integration for
  Video Classification
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification
Xiang Long
Chuang Gan
Gerard de Melo
Jiajun Wu
Xiao-Chang Liu
Shilei Wen
31
208
0
27 Nov 2017
Audio Set classification with attention model: A probabilistic
  perspective
Audio Set classification with attention model: A probabilistic perspective
Qiuqiang Kong
Yong-mei Xu
Wenwu Wang
Mark D. Plumbley
BDL
18
104
0
02 Nov 2017
Listening to the World Improves Speech Command Recognition
Listening to the World Improves Speech Command Recognition
B. McMahan
D. Rao
26
38
0
23 Oct 2017
Revisiting the Effectiveness of Off-the-shelf Temporal Modeling
  Approaches for Large-scale Video Classification
Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification
Yunlong Bian
Chuang Gan
Xiao-Chang Liu
Fu Li
Xiang Long
Yandong Li
Heng Qi
Jie Zhou
Shilei Wen
Yuanqing Lin
18
48
0
12 Aug 2017
VoxCeleb: a large-scale speaker identification dataset
VoxCeleb: a large-scale speaker identification dataset
Arsha Nagrani
Joon Son Chung
Andrew Zisserman
6
2,247
0
26 Jun 2017
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks
Manuk Akopyan
Eshsou Khashba
22
7
0
14 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent
  Neural Networks for Robust Speech Emotion Recognition
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition
Che-Wei Huang
Shrikanth. S. Narayanan
HAI
27
25
0
07 Jun 2017
Putting a Face to the Voice: Fusing Audio and Visual Signals Across a
  Video to Determine Speakers
Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers
K. Hoover
Sourish Chaudhuri
C. Pantofaru
M. Slaney
Ian Sturdy
CVBM
14
32
0
31 May 2017
Previous
1234567