SoundNet: Learning Sound Representations from Unlabeled Video

27 October 2016

Y. Aytar

Carl Vondrick

Antonio Torralba

SSL

ArXiv PDF HTML

Papers citing "SoundNet: Learning Sound Representations from Unlabeled Video"

30 / 180 papers shown

Title
An Attempt towards Interpretable Audio-Visual Video Captioning Yapeng Tian Chenxiao Guan Justin Goodman Marc Moore Chenliang Xu 36 20 0 07 Dec 2018
Cogni-Net: Cognitive Feature Learning through Deep Visual Perception Pranay Mukherjee Abhirup Das A. Bhunia P. Roy 6 11 0 01 Nov 2018
Training neural audio classifiers with few data Jordi Pons Joan Serrà Xavier Serra 21 57 0 24 Oct 2018
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos Dawei Liang Edison Thomaz 14 80 0 19 Oct 2018
Self-Supervised Generation of Spatial Audio for 360 Video Pedro Morgado Nuno Vasconcelos Timothy R. Langlois Oliver Wang MDE 24 171 0 07 Sep 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild Samuel Albanie Arsha Nagrani Andrea Vedaldi Andrew Zisserman CVBM 30 270 0 16 Aug 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features Chiori Hori Huda AlAmri Jue Wang Gordon Wichern Takaaki Hori ... Raphael Gontijo-Lopes Abhishek Das Irfan Essa Dhruv Batra Devi Parikh VGen 18 125 0 21 Jun 2018
Weakly-supervised Visual Instrument-playing Action Detection in Videos Jen-Yu Liu Yi-Hsuan Yang Shyh-Kang Jeng 21 13 0 05 May 2018
Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge Ziqi Zheng Chenjie Cao Xingwei Chen Guoqiang Xu 38 19 0 03 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity Arsha Nagrani Samuel Albanie Andrew Zisserman SSL 41 140 0 02 May 2018
A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization R. Abreu J. Santos Eduardo Bezerra 26 8 0 28 Apr 2018
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events Sanjeel Parekh S. Essid A. Ozerov Ngoc Q. K. Duong P. Pérez G. Richard SSL 8 19 0 19 Apr 2018
The Sound of Pixels Hang Zhao Chuang Gan Andrew Rouditchenko Carl Vondrick Josh H. McDermott Antonio Torralba VLM 22 529 0 09 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching Arsha Nagrani Samuel Albanie Andrew Zisserman CVBM 19 219 0 01 Apr 2018
Learning Environmental Sounds with Multi-scale Convolutional Neural Network Boqing Zhu Changjian Wang Feng Liu Jin Lei Zengquan Lu Yuxing Peng 14 64 0 25 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos Yapeng Tian Jing Shi Bochen Li Zhiyao Duan Chenliang Xu 36 426 0 23 Mar 2018
Moments in Time Dataset: one million videos for event understanding Mathew Monfort A. Andonian Bolei Zhou K. Ramakrishnan Sarah Adel Bargal ... L. Brown Quanfu Fan Dan Gutfreund Carl Vondrick A. Oliva 47 538 0 09 Jan 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning Andrew Owens Jiajun Wu Josh H. McDermott William T. Freeman Antonio Torralba SSL 41 177 0 20 Dec 2017
Objects that Sound Relja Arandjelović Andrew Zisserman ObjD VOS 44 528 0 18 Dec 2017
Learning from Between-class Examples for Deep Sound Recognition Yuji Tokozume Yoshitaka Ushiku Tatsuya Harada SSL 24 236 0 28 Nov 2017
Semantic speech retrieval with a visually grounded model of untranscribed speech Herman Kamper Gregory Shakhnarovich Karen Livescu 29 53 0 05 Oct 2017
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement Aviv Gabbay Ariel Ephrat Tavi Halperin Shmuel Peleg 31 19 0 22 Aug 2017
Audio Super Resolution using Neural Networks Volodymyr Kuleshov S. Enam Stefano Ermon SupR 28 126 0 02 Aug 2017
Automatic Curation of Golf Highlights using Multimodal Excitement Features Michele Merler D. Joshi Q. Nguyen Stephen Hammer John Kent John R. Smith Rogerio Feris VGen 24 18 0 22 Jul 2017
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks M. Huzaifah AI4TS 22 148 0 22 Jun 2017
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras Keunwoo Choi Deokjin Joo Juho Kim VLM 13 72 0 19 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 15 2,865 0 26 May 2017
Visually grounded learning of keyword prediction from untranscribed speech Herman Kamper Shane Settle Gregory Shakhnarovich Karen Livescu 19 63 0 23 Mar 2017
Generating Videos with Scene Dynamics Carl Vondrick Hamed Pirsiavash Antonio Torralba GAN VGen 89 1,460 0 08 Sep 2016
Acoustic Scene Classification D. Barchiesi D. Giannoulis D. Stowell Mark D. Plumbley 102 406 0 13 Nov 2014