Title
Sound Event Detection Using Graph Laplacian Regularization Based on Event Co-occurrence Keisuke Imoto Seisuke Kyochi 9 12 0 02 Feb 2019
Enhancing Sound Texture in CNN-Based Acoustic Scene Classification Yuzhong Wu Tan Lee 12 39 0 06 Jan 2019
From FiLM to Video: Multi-turn Question Answering with Multi-modal Context T. Nguyen Shikhar Sharma Hannes Schulz Layla El Asri 15 33 0 17 Dec 2018
An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian Chenxiao Guan Justin Goodman Marc Moore Chenliang Xu 36 20 0 07 Dec 2018
Learning to match transient sound events using attentional similarity for few-shot sound recognition Szu-Yu Chou Kai-Hsiang Cheng J. Jang Yi-Hsuan Yang 21 59 0 04 Dec 2018
SwishNet: A Fast Convolutional Neural Network for Speech, Music and Noise Classification and Segmentation Md Shamim Hussain M. A. Haque 23 48 0 01 Dec 2018
Learning Sound Events From Webly Labeled Data Anurag Kumar Ankit Parag Shah Bhiksha Raj Alexander G. Hauptmann NoLa 29 12 0 25 Nov 2018
General audio tagging with ensembling convolutional neural network and statistical features Kele Xu Boqing Zhu Qiuqiang Kong Haibo Mi Bo Ding Dezhi Wang Huaimin Wang 22 30 0 30 Oct 2018
Training neural audio classifiers with few data Jordi Pons Joan Serrà Xavier Serra 16 57 0 24 Oct 2018
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos Dawei Liang Edison Thomaz 12 80 0 19 Oct 2018
Towards Good Practices for Multi-modal Fusion in Large-scale Video Classification Jinlai Liu Zehuan Yuan Changhu Wang 24 9 0 16 Sep 2018
Self-Supervised Generation of Spatial Audio for 360 Video Pedro Morgado Nuno Vasconcelos Timothy R. Langlois Oliver Wang MDE 24 171 0 07 Sep 2018
The ActivityNet Large-Scale Activity Recognition Challenge 2018 Summary Guohao Li Juan Carlos Niebles Cees G. M. Snoek Fabian Caba Heilbron Humam Alwassel Victor Escorcia Ranjay Krishna S. Buch Cuong Duc Dao 42 65 0 11 Aug 2018
RUC+CMU: System Report for Dense Captioning Events in Videos Shizhe Chen Yuqing Song Yida Zhao Jiarong Qiu Qin Jin Alexander G. Hauptmann 16 7 0 22 Jun 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features Chiori Hori Huda AlAmri Jue Wang G. Wichern Takaaki Hori ... Raphael Gontijo-Lopes Abhishek Das Irfan Essa Dhruv Batra Devi Parikh VGen 18 125 0 21 Jun 2018
Mining for meaning: from vision to language through multiple networks consensus Iulia Duta Andrei Liviu Nicolicioiu Simion-Vlad Bogolin Marius Leordeanu 18 3 0 05 Jun 2018
Adaptive pooling operators for weakly labeled sound event detection Brian McFee Justin Salamon J. P. Bello 22 148 0 26 Apr 2018
A Closer Look at Weak Label Learning for Audio Events Ankit Parag Shah Anurag Kumar Alexander G. Hauptmann Bhiksha Raj 6 64 0 24 Apr 2018
Watch, Listen, and Describe: Globally and Locally Aligned Cross-Modal Attentions for Video Captioning Qing Guo Yuan-fang Wang William Yang Wang 13 76 0 15 Apr 2018
The Sound of Pixels Hang Zhao Chuang Gan Andrew Rouditchenko Carl Vondrick Josh H. McDermott Antonio Torralba VLM 22 529 0 09 Apr 2018
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu 36 425 0 23 Mar 2018
Flex-Convolution (Million-Scale Point-Cloud Learning Beyond Grid-Worlds) F. Groh P. Wieschollek Hendrik P. A. Lensch 3DPC 16 107 0 20 Mar 2018
Neural Predictive Coding using Convolutional Neural Networks towards Unsupervised Learning of Speaker Characteristics Arindam Jati P. Georgiou SSL 13 48 0 22 Feb 2018
Adversarial Audio Synthesis Chris Donahue Julian McAuley M. Puckette GAN 27 602 0 12 Feb 2018
Moments in Time Dataset: one million videos for event understanding Mathew Monfort A. Andonian Bolei Zhou K. Ramakrishnan Sarah Adel Bargal ... L. Brown Quanfu Fan Dan Gutfreund Carl Vondrick A. Oliva 47 538 0 09 Jan 2018
Cross-modal Embeddings for Video and Audio Retrieval Dídac Surís A. Duarte Amaia Salvador Jordi Torres Xavier Giró-i-Nieto SSL 18 69 0 07 Jan 2018
Improved Inception-Residual Convolutional Neural Network for Object Recognition Md. Zahangir Alom Mahmudul Hasan C. Yakopcic T. Taha V. Asari 46 116 0 28 Dec 2017
Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation Fabian-Robert Stöter Soumitro Chakrabarty B. Edler Emanuel Habets BDL 29 37 0 12 Dec 2017
Attention Clusters: Purely Attention Based Local Feature Integration for Video Classification Xiang Long Chuang Gan Gerard de Melo Jiajun Wu Xiao-Chang Liu Shilei Wen 31 208 0 27 Nov 2017
Audio Set classification with attention model: A probabilistic perspective Qiuqiang Kong Yong-mei Xu Wenwu Wang Mark D. Plumbley BDL 18 104 0 02 Nov 2017
Listening to the World Improves Speech Command Recognition B. McMahan D. Rao 26 38 0 23 Oct 2017
Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification Yunlong Bian Chuang Gan Xiao-Chang Liu Fu Li Xiang Long Yandong Li Heng Qi Jie Zhou Shilei Wen Yuanqing Lin 18 48 0 12 Aug 2017
VoxCeleb: a large-scale speaker identification dataset Arsha Nagrani Joon Son Chung Andrew Zisserman 6 2,247 0 26 Jun 2017
Large-Scale YouTube-8M Video Understanding with Deep Neural Networks Manuk Akopyan Eshsou Khashba 22 7 0 14 Jun 2017
Characterizing Types of Convolution in Deep Convolutional Recurrent Neural Networks for Robust Speech Emotion Recognition Che-Wei Huang Shrikanth. S. Narayanan HAI 27 25 0 07 Jun 2017
Putting a Face to the Voice: Fusing Audio and Visual Signals Across a Video to Determine Speakers K. Hoover Sourish Chaudhuri C. Pantofaru M. Slaney Ian Sturdy CVBM 14 32 0 31 May 2017