Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1610.09001
Cited By
SoundNet: Learning Sound Representations from Unlabeled Video
27 October 2016
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SoundNet: Learning Sound Representations from Unlabeled Video"
50 / 180 papers shown
Title
Urban Rhapsody: Large-scale exploration of urban soundscapes
Joao Rulff
Fabio Miranda
Maryam Hosseini
Marcos Lage
M. Cartwright
Graham Dove
J. P. Bello
Claudio T. Silva
19
7
0
25 May 2022
Weakly-Supervised Action Detection Guided by Audio Narration
Keren Ye
Adriana Kovashka
38
0
0
12 May 2022
Probabilistic Representations for Video Contrastive Learning
Jungin Park
Jiyoung Lee
Ig-Jae Kim
Kwanghoon Sohn
SSL
29
43
0
08 Apr 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Joey Tianyi Zhou
Gedas Bertasius
49
39
0
06 Apr 2022
Learning Neural Acoustic Fields
Andrew F. Luo
Yilun Du
Michael J. Tarr
J. Tenenbaum
Antonio Torralba
Chuang Gan
AI4CE
20
79
0
04 Apr 2022
1-D CNN based Acoustic Scene Classification via Reducing Layer-wise Dimensionality
Arshdeep Singh
25
1
0
31 Mar 2022
Multitask Emotion Recognition Model with Knowledge Distillation and Task Discriminator
Euiseok Jeong
Geesung Oh
Sejoon Lim
CVBM
22
7
0
24 Mar 2022
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach
Dawei Liang
Zifan Xu
Yinuo Chen
Rebecca Adaimi
David Harwath
Edison Thomaz
48
1
0
21 Mar 2022
Learning Audio Representations with MLPs
Mashrur M. Morshed
Ahmad Omar Ahsan
H. Mahmud
Md. Kamrul Hasan
27
4
0
16 Mar 2022
Visually Supervised Speaker Detection and Localization via Microphone Array
Davide Berghi
A. Hilton
Philip J. B. Jackson
24
11
0
07 Mar 2022
Audio Self-supervised Learning: A Survey
Shuo Liu
Adria Mallol-Ragolta
Emilia Parada-Cabeleiro
Kun Qian
Xingshuo Jing
Alexander Kathan
Bin Hu
Bjoern W. Schuller
SSL
40
106
0
02 Mar 2022
Visual Sound Localization in the Wild by Cross-Modal Interference Erasing
Xian Liu
Rui Qian
Hang Zhou
Di Hu
Weiyao Lin
Ziwei Liu
Bolei Zhou
Xiaowei Zhou
18
25
0
13 Feb 2022
Real-time Emergency Vehicle Event Detection Using Audio Data
Zubayer Islam
Mohamed Abdel-Aty
14
5
0
03 Feb 2022
Keyword localisation in untranscribed speech using visually grounded speech models
Kayode Olaleye
Dan Oneaţă
Herman Kamper
32
7
0
02 Feb 2022
Sound and Visual Representation Learning with Multiple Pretraining Tasks
A. Vasudevan
Dengxin Dai
Luc Van Gool
SSL
33
6
0
04 Jan 2022
Multimodal Image Synthesis and Editing: The Generative AI Era
Fangneng Zhan
Yingchen Yu
Rongliang Wu
Jiahui Zhang
Shijian Lu
Lingjie Liu
Adam Kortylewski
Christian Theobalt
Eric Xing
EGVM
31
48
0
27 Dec 2021
Class-aware Sounding Objects Localization via Audiovisual Correspondence
Di Hu
Yake Wei
Rui Qian
Weiyao Lin
Ruihua Song
Ji-Rong Wen
24
41
0
22 Dec 2021
Everything at Once -- Multi-modal Fusion Transformer for Video Retrieval
Nina Shvetsova
Brian Chen
Andrew Rouditchenko
Samuel Thomas
Brian Kingsbury
Rogerio Feris
David Harwath
James R. Glass
Hilde Kuehne
ViT
34
128
0
08 Dec 2021
Health Monitoring of Industrial machines using Scene-Aware Threshold Selection
Arshdeep Singh
R. Arvind
Padmanabhan Rajan
19
1
0
21 Nov 2021
Geometry-Aware Multi-Task Learning for Binaural Audio Generation from Video
Rishabh Garg
Ruohan Gao
Kristen Grauman
15
28
0
21 Nov 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIP
VLM
42
268
0
21 Oct 2021
The Impact of Spatiotemporal Augmentations on Self-Supervised Audiovisual Representation Learning
Haider Al-Tahan
Y. Mohsenzadeh
SSL
AI4TS
34
0
0
13 Oct 2021
Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or ....
Prateek Verma
AI4TS
32
2
0
07 Oct 2021
Understanding and Improving Usability of Data Dashboards for Simplified Privacy Control of Voice Assistant Data (Extended Version)
Vandit Sharma
Mainack Mondal
17
3
0
06 Oct 2021
Audio-Visual Collaborative Representation Learning for Dynamic Saliency Prediction
Hailong Ning
Bin Zhao
Zhanxuan Hu
Lang He
Ercheng Pei
32
10
0
17 Sep 2021
Parsing Birdsong with Deep Audio Embeddings
Irina Tolkova
Brian Chu
Marcel Hedman
Stefan Kahl
Holger Klinck
36
10
0
20 Aug 2021
LIGA-Stereo: Learning LiDAR Geometry Aware Representations for Stereo-based 3D Detector
Xiaoyang Guo
Shaoshuai Shi
Xiaogang Wang
Hongsheng Li
3DPC
34
106
0
18 Aug 2021
Cross-modal Spectrum Transformation Network For Acoustic Scene classification
Yang Liu
A. Neophytou
Sunando Sengupta
Eric Sommerlade
21
9
0
13 Aug 2021
Hybrid Reasoning Network for Video-based Commonsense Captioning
Weijiang Yu
Jian Liang
Lei Ji
Lu Li
Yuejian Fang
Nong Xiao
Nan Duan
19
10
0
05 Aug 2021
DarkGAN: Exploiting Knowledge Distillation for Comprehensible Audio Synthesis with GANs
J. Nistal
Stefan Lattner
G. Richard
21
8
0
03 Aug 2021
PERSA+: A Deep Learning Front-End for Context-Agnostic Audio Classification
Lazaros Vrysis
Iordanis Thoidis
Charalampos A. Dimoulas
G. Papanikolaou
VLM
33
0
0
20 Jul 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
27
26
0
20 Jul 2021
Attention Bottlenecks for Multimodal Fusion
Arsha Nagrani
Shan Yang
Anurag Arnab
A. Jansen
Cordelia Schmid
Chen Sun
25
543
0
30 Jun 2021
Multi-level Attention Fusion Network for Audio-visual Event Recognition
Mathilde Brousmiche
Jean Rouat
Stéphane Dupont
27
11
0
12 Jun 2021
VPN++: Rethinking Video-Pose embeddings for understanding Activities of Daily Living
Srijan Das
Rui Dai
Di Yang
F. Brémond
ViT
43
66
0
17 May 2021
Temporal-Spatial Feature Pyramid for Video Saliency Detection
Qinyao Chang
Shiping Zhu
41
27
0
10 May 2021
Ego-Exo: Transferring Visual Representations from Third-person to First-person Videos
Yanghao Li
Tushar Nagarajan
Bo Xiong
Kristen Grauman
EgoV
51
84
0
16 Apr 2021
Can audio-visual integration strengthen robustness under multimodal attacks?
Yapeng Tian
Chenliang Xu
AAML
36
37
0
05 Apr 2021
Unsupervised Sound Localization via Iterative Contrastive Learning
Yan-Bo Lin
Hung-Yu Tseng
Hsin-Ying Lee
Yen-Yu Lin
Ming-Hsuan Yang
SSL
27
34
0
01 Apr 2021
Space-Time Crop & Attend: Improving Cross-modal Video Representation Learning
Mandela Patrick
Yuki M. Asano
Bernie Huang
Ishan Misra
Florian Metze
Joao Henriques
Andrea Vedaldi
AI4TS
29
33
0
18 Mar 2021
Slow-Fast Auditory Streams For Audio Recognition
Evangelos Kazakos
Arsha Nagrani
Andrew Zisserman
Dima Damen
24
66
0
05 Mar 2021
Audio-Visual Speech Separation Using Cross-Modal Correspondence Loss
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
Ryo Masumura
30
8
0
02 Mar 2021
There is More than Meets the Eye: Self-Supervised Multi-Object Detection and Tracking with Sound by Distilling Multimodal Knowledge
Francisco Rivera Valverde
Juana Valeria Hurtado
Abhinav Valada
26
72
0
01 Mar 2021
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
44
20
0
05 Feb 2021
Environment Transfer for Distributed Systems
Chunheng Jiang
Jae-wook Ahn
N. Desai
28
1
0
06 Jan 2021
ViNet: Pushing the limits of Visual Modality for Audio-Visual Saliency Prediction
Samyak Jain
P. Yarlagadda
Shreyank Jyoti
Shyamgopal Karthik
Subramanian Ramanathan
Vineet Gandhi
ViT
29
66
0
11 Dec 2020
Learning to dance: A graph convolutional adversarial network to generate realistic dance motions from audio
João P. Ferreira
Thiago M. Coutinho
Thiago L. Gomes
J. F. Neto
Rafael Azevedo
Renato Martins
Erickson R. Nascimento
GAN
36
68
0
25 Nov 2020
Learning Representations from Audio-Visual Spatial Alignment
Pedro Morgado
Yi Li
Nuno Vasconcelos
SSL
27
121
0
03 Nov 2020
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
28
32
0
22 Oct 2020
Look, Listen, and Attend: Co-Attention Network for Self-Supervised Audio-Visual Representation Learning
Ying Cheng
Ruize Wang
Zhihao Pan
Rui Feng
Yuejie Zhang
SSL
36
106
0
13 Aug 2020
Previous
1
2
3
4
Next