Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1610.09001
Cited By
SoundNet: Learning Sound Representations from Unlabeled Video
27 October 2016
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SoundNet: Learning Sound Representations from Unlabeled Video"
30 / 180 papers shown
Title
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Cogni-Net: Cognitive Feature Learning through Deep Visual Perception
Pranay Mukherjee
Abhirup Das
A. Bhunia
P. Roy
6
11
0
01 Nov 2018
Training neural audio classifiers with few data
Jordi Pons
Joan Serrà
Xavier Serra
21
57
0
24 Oct 2018
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
Dawei Liang
Edison Thomaz
14
80
0
19 Oct 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
24
171
0
07 Sep 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
30
270
0
16 Aug 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
Gordon Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
18
125
0
21 Jun 2018
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Jen-Yu Liu
Yi-Hsuan Yang
Shyh-Kang Jeng
21
13
0
05 May 2018
Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge
Ziqi Zheng
Chenjie Cao
Xingwei Chen
Guoqiang Xu
38
19
0
03 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
41
140
0
02 May 2018
A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
R. Abreu
J. Santos
Eduardo Bezerra
26
8
0
28 Apr 2018
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events
Sanjeel Parekh
S. Essid
A. Ozerov
Ngoc Q. K. Duong
P. Pérez
G. Richard
SSL
8
19
0
19 Apr 2018
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
529
0
09 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
19
219
0
01 Apr 2018
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
Boqing Zhu
Changjian Wang
Feng Liu
Jin Lei
Zengquan Lu
Yuxing Peng
14
64
0
25 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
36
426
0
23 Mar 2018
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
47
538
0
09 Jan 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
SSL
41
177
0
20 Dec 2017
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
44
528
0
18 Dec 2017
Learning from Between-class Examples for Deep Sound Recognition
Yuji Tokozume
Yoshitaka Ushiku
Tatsuya Harada
SSL
24
236
0
28 Nov 2017
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
29
53
0
05 Oct 2017
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Aviv Gabbay
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
31
19
0
22 Aug 2017
Audio Super Resolution using Neural Networks
Volodymyr Kuleshov
S. Enam
Stefano Ermon
SupR
28
126
0
02 Aug 2017
Automatic Curation of Golf Highlights using Multimodal Excitement Features
Michele Merler
D. Joshi
Q. Nguyen
Stephen Hammer
John Kent
John R. Smith
Rogerio Feris
VGen
24
18
0
22 Jul 2017
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
M. Huzaifah
AI4TS
22
148
0
22 Jun 2017
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Keunwoo Choi
Deokjin Joo
Juho Kim
VLM
13
72
0
19 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,865
0
26 May 2017
Visually grounded learning of keyword prediction from untranscribed speech
Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
19
63
0
23 Mar 2017
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
89
1,460
0
08 Sep 2016
Acoustic Scene Classification
D. Barchiesi
D. Giannoulis
D. Stowell
Mark D. Plumbley
102
406
0
13 Nov 2014
Previous
1
2
3
4