Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1804.03160
Cited By
The Sound of Pixels
9 April 2018
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Sound of Pixels"
39 / 39 papers shown
Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
Hilde Kuehne
58
0
0
02 May 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
128
1
0
01 Apr 2025
Audio Visual Segmentation Through Text Embeddings
Kyungbok Lee
You Zhang
Z. Duan
84
0
0
22 Feb 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
82
7
0
10 Feb 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
83
28
0
02 Jan 2025
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
123
4
0
23 Nov 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
80
3
0
14 Oct 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Xuelong Li
VLM
106
2
0
14 Sep 2024
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
58
2
0
08 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
69
2
0
02 Jul 2024
Multimodal Deep Learning
Cem Akkus
Jiquan Ngiam
Vladana Djakovic
Steffen Jauch-Walser
A. Khosla
...
Jann Goschenhofer
Honglak Lee
A. Ng
Daniel Schalk
Matthias Aßenmacher
42
3,161
0
12 Jan 2023
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
60
20
0
05 Feb 2021
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
45
75
0
11 Jun 2020
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
97
88
0
24 Oct 2019
Tracking Emerges by Colorizing Videos
Carl Vondrick
Abhinav Shrivastava
Alireza Fathi
S. Guadarrama
Kevin Patrick Murphy
69
376
0
25 Jun 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
69
747
0
10 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
41
220
0
01 Apr 2018
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
50
344
0
10 Mar 2018
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
59
529
0
18 Dec 2017
Visual to Sound: Generating Natural Sound for Videos in the Wild
Yipin Zhou
Zhaowen Wang
Chen Fang
Trung Bui
Tamara L. Berg
VGen
47
206
0
04 Dec 2017
Supervised Speech Separation Based on Deep Learning: An Overview
DeLiang Wang
Jitong Chen
SSL
45
1,359
0
24 Aug 2017
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Aviv Gabbay
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
54
19
0
22 Aug 2017
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
65
897
0
23 May 2017
Neural Face Editing with Intrinsic Image Disentangling
Zhixin Shu
Ersin Yumer
Sunil Hadap
Kalyan Sunkavalli
Eli Shechtman
Dimitris Samaras
CVBM
DRL
GAN
68
285
0
13 Apr 2017
Colorization as a Proxy Task for Visual Understanding
Gustav Larsson
Michael Maire
Gregory Shakhnarovich
SSL
128
494
0
11 Mar 2017
Creating A Multi-track Classical Musical Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications
Bochen Li
Xinzhao Liu
K. Dinesh
Z. Duan
Gaurav Sharma
99
150
0
27 Dec 2016
Learning Features by Watching Objects Move
Deepak Pathak
Ross B. Girshick
Piotr Dollár
Trevor Darrell
Bharath Hariharan
SSL
VOS
OCL
55
523
0
19 Dec 2016
SoundNet: Learning Sound Representations from Unlabeled Video
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
77
1,040
0
27 Oct 2016
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
86
2,488
0
29 Sep 2016
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
135
1,464
0
08 Sep 2016
Context Encoders: Feature Learning by Inpainting
Deepak Pathak
Philipp Krahenbuhl
Jeff Donahue
Trevor Darrell
Alexei A. Efros
SSL
36
5,277
0
25 Apr 2016
Visually Indicated Sounds
Andrew Owens
Phillip Isola
Josh H. McDermott
Antonio Torralba
Edward H. Adelson
William T. Freeman
66
382
0
28 Dec 2015
Learning Deep Features for Discriminative Localization
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
SSL
SSeg
FAtt
123
9,266
0
14 Dec 2015
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.2K
192,638
0
10 Dec 2015
Deep clustering: Discriminative embeddings for segmentation and separation
J. Hershey
Zhuo Chen
Jonathan Le Roux
Shinji Watanabe
39
1,316
0
18 Aug 2015
Unsupervised Visual Representation Learning by Context Prediction
Carl Doersch
Abhinav Gupta
Alexei A. Efros
DRL
SSL
135
2,777
0
19 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
989
76,547
0
18 May 2015
Learning image representations tied to ego-motion
Dinesh Jayaraman
Kristen Grauman
SSL
60
245
0
08 May 2015
Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Andrew J. R. Simpson
Gerard Roma
Mark D. Plumbley
50
102
0
17 Apr 2015
1