ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1610.09001
  4. Cited By
SoundNet: Learning Sound Representations from Unlabeled Video

SoundNet: Learning Sound Representations from Unlabeled Video

27 October 2016
Y. Aytar
Carl Vondrick
Antonio Torralba
    SSL
ArXivPDFHTML

Papers citing "SoundNet: Learning Sound Representations from Unlabeled Video"

30 / 180 papers shown
Title
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
36
20
0
07 Dec 2018
Cogni-Net: Cognitive Feature Learning through Deep Visual Perception
Cogni-Net: Cognitive Feature Learning through Deep Visual Perception
Pranay Mukherjee
Abhirup Das
A. Bhunia
P. Roy
6
11
0
01 Nov 2018
Training neural audio classifiers with few data
Training neural audio classifiers with few data
Jordi Pons
Joan Serrà
Xavier Serra
21
57
0
24 Oct 2018
Audio-Based Activities of Daily Living (ADL) Recognition with
  Large-Scale Acoustic Embeddings from Online Videos
Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos
Dawei Liang
Edison Thomaz
14
80
0
19 Oct 2018
Self-Supervised Generation of Spatial Audio for 360 Video
Self-Supervised Generation of Spatial Audio for 360 Video
Pedro Morgado
Nuno Vasconcelos
Timothy R. Langlois
Oliver Wang
MDE
24
171
0
07 Sep 2018
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Emotion Recognition in Speech using Cross-Modal Transfer in the Wild
Samuel Albanie
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
CVBM
30
270
0
16 Aug 2018
End-to-End Audio Visual Scene-Aware Dialog using Multimodal
  Attention-Based Video Features
End-to-End Audio Visual Scene-Aware Dialog using Multimodal Attention-Based Video Features
Chiori Hori
Huda AlAmri
Jue Wang
Gordon Wichern
Takaaki Hori
...
Raphael Gontijo-Lopes
Abhishek Das
Irfan Essa
Dhruv Batra
Devi Parikh
VGen
18
125
0
21 Jun 2018
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Weakly-supervised Visual Instrument-playing Action Detection in Videos
Jen-Yu Liu
Yi-Hsuan Yang
Shyh-Kang Jeng
21
13
0
05 May 2018
Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge
Multimodal Emotion Recognition for One-Minute-Gradual Emotion Challenge
Ziqi Zheng
Chenjie Cao
Xingwei Chen
Guoqiang Xu
38
19
0
03 May 2018
Learnable PINs: Cross-Modal Embeddings for Person Identity
Learnable PINs: Cross-Modal Embeddings for Person Identity
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
SSL
41
140
0
02 May 2018
A Bimodal Learning Approach to Assist Multi-sensory Effects
  Synchronization
A Bimodal Learning Approach to Assist Multi-sensory Effects Synchronization
R. Abreu
J. Santos
Eduardo Bezerra
26
8
0
28 Apr 2018
Weakly Supervised Representation Learning for Unsynchronized
  Audio-Visual Events
Weakly Supervised Representation Learning for Unsynchronized Audio-Visual Events
Sanjeel Parekh
S. Essid
A. Ozerov
Ngoc Q. K. Duong
P. Pérez
G. Richard
SSL
8
19
0
19 Apr 2018
The Sound of Pixels
The Sound of Pixels
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
22
529
0
09 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
19
219
0
01 Apr 2018
Learning Environmental Sounds with Multi-scale Convolutional Neural
  Network
Learning Environmental Sounds with Multi-scale Convolutional Neural Network
Boqing Zhu
Changjian Wang
Feng Liu
Jin Lei
Zengquan Lu
Yuxing Peng
14
64
0
25 Mar 2018
Audio-Visual Event Localization in Unconstrained Videos
Audio-Visual Event Localization in Unconstrained Videos
Yapeng Tian
Jing Shi
Bochen Li
Zhiyao Duan
Chenliang Xu
36
426
0
23 Mar 2018
Moments in Time Dataset: one million videos for event understanding
Moments in Time Dataset: one million videos for event understanding
Mathew Monfort
A. Andonian
Bolei Zhou
K. Ramakrishnan
Sarah Adel Bargal
...
L. Brown
Quanfu Fan
Dan Gutfreund
Carl Vondrick
A. Oliva
47
538
0
09 Jan 2018
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual
  Learning
Learning Sight from Sound: Ambient Sound Provides Supervision for Visual Learning
Andrew Owens
Jiajun Wu
Josh H. McDermott
William T. Freeman
Antonio Torralba
SSL
41
177
0
20 Dec 2017
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
44
528
0
18 Dec 2017
Learning from Between-class Examples for Deep Sound Recognition
Learning from Between-class Examples for Deep Sound Recognition
Yuji Tokozume
Yoshitaka Ushiku
Tatsuya Harada
SSL
24
236
0
28 Nov 2017
Semantic speech retrieval with a visually grounded model of
  untranscribed speech
Semantic speech retrieval with a visually grounded model of untranscribed speech
Herman Kamper
Gregory Shakhnarovich
Karen Livescu
29
53
0
05 Oct 2017
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Aviv Gabbay
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
31
19
0
22 Aug 2017
Audio Super Resolution using Neural Networks
Audio Super Resolution using Neural Networks
Volodymyr Kuleshov
S. Enam
Stefano Ermon
SupR
28
126
0
02 Aug 2017
Automatic Curation of Golf Highlights using Multimodal Excitement
  Features
Automatic Curation of Golf Highlights using Multimodal Excitement Features
Michele Merler
D. Joshi
Q. Nguyen
Stephen Hammer
John Kent
John R. Smith
Rogerio Feris
VGen
24
18
0
22 Jul 2017
Comparison of Time-Frequency Representations for Environmental Sound
  Classification using Convolutional Neural Networks
Comparison of Time-Frequency Representations for Environmental Sound Classification using Convolutional Neural Networks
M. Huzaifah
AI4TS
22
148
0
22 Jun 2017
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of
  Deep Neural Network Models with Keras
Kapre: On-GPU Audio Preprocessing Layers for a Quick Implementation of Deep Neural Network Models with Keras
Keunwoo Choi
Deokjin Joo
Juho Kim
VLM
13
72
0
19 Jun 2017
Multimodal Machine Learning: A Survey and Taxonomy
Multimodal Machine Learning: A Survey and Taxonomy
T. Baltrušaitis
Chaitanya Ahuja
Louis-Philippe Morency
15
2,865
0
26 May 2017
Visually grounded learning of keyword prediction from untranscribed
  speech
Visually grounded learning of keyword prediction from untranscribed speech
Herman Kamper
Shane Settle
Gregory Shakhnarovich
Karen Livescu
19
63
0
23 Mar 2017
Generating Videos with Scene Dynamics
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
89
1,460
0
08 Sep 2016
Acoustic Scene Classification
Acoustic Scene Classification
D. Barchiesi
D. Giannoulis
D. Stowell
Mark D. Plumbley
102
406
0
13 Nov 2014
Previous
1234