ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1804.03160
  4. Cited By
The Sound of Pixels

The Sound of Pixels

9 April 2018
Hang Zhao
Chuang Gan
Andrew Rouditchenko
Carl Vondrick
Josh H. McDermott
Antonio Torralba
    VLM
ArXivPDFHTML

Papers citing "The Sound of Pixels"

39 / 39 papers shown
Title
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment
Edson Araujo
Andrew Rouditchenko
Yuan Gong
Saurabhchand Bhati
Samuel Thomas
Brian Kingsbury
Leonid Karlinsky
Rogerio Feris
James Glass
Hilde Kuehne
58
0
0
02 May 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kai Zhang
MGen
VGen
144
1
0
01 Apr 2025
Audio Visual Segmentation Through Text Embeddings
Audio Visual Segmentation Through Text Embeddings
Kyungbok Lee
You Zhang
Z. Duan
84
0
0
22 Feb 2025
Learning Musical Representations for Music Performance Question Answering
Xingjian Diao
Chunhui Zhang
Tingxuan Wu
Ming Cheng
Z. Ouyang
Weiyi Wu
Jiang Gui
87
7
0
10 Feb 2025
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Reading to Listen at the Cocktail Party: Multi-Modal Speech Separation
Akam Rahimi
Triantafyllos Afouras
Andrew Zisserman
83
28
0
02 Jan 2025
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
123
4
0
23 Nov 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
82
3
0
14 Oct 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Xuelong Li
VLM
109
2
0
14 Sep 2024
Sequential Contrastive Audio-Visual Learning
Sequential Contrastive Audio-Visual Learning
Ioannis Tsiamas
Santiago Pascual
Chunghsin Yeh
Joan Serrà
62
2
0
08 Jul 2024
SOAF: Scene Occlusion-aware Neural Acoustic Field
SOAF: Scene Occlusion-aware Neural Acoustic Field
Huiyu Gao
Jiahao Ma
David Ahmedt-Aristizabal
Chuong H. Nguyen
Miaomiao Liu
69
2
0
02 Jul 2024
Multimodal Deep Learning
Multimodal Deep Learning
Cem Akkus
Jiquan Ngiam
Vladana Djakovic
Steffen Jauch-Walser
A. Khosla
...
Jann Goschenhofer
Honglak Lee
A. Ng
Daniel Schalk
Matthias Aßenmacher
57
3,161
0
12 Jan 2023
Learning Audio-Visual Correlations from Variational Cross-Modal
  Generation
Learning Audio-Visual Correlations from Variational Cross-Modal Generation
Ye Zhu
Yu Wu
Hugo Latapie
Yi Yang
Yan Yan
SSL
67
20
0
05 Feb 2021
Telling Left from Right: Learning Spatial Correspondence of Sight and
  Sound
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
56
75
0
11 Jun 2020
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
108
88
0
24 Oct 2019
Tracking Emerges by Colorizing Videos
Tracking Emerges by Colorizing Videos
Carl Vondrick
Abhinav Shrivastava
Alireza Fathi
S. Guadarrama
Kevin Patrick Murphy
71
376
0
25 Jun 2018
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Audio-Visual Scene Analysis with Self-Supervised Multisensory Features
Andrew Owens
Alexei A. Efros
SSL
72
747
0
10 Apr 2018
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Seeing Voices and Hearing Faces: Cross-modal biometric matching
Arsha Nagrani
Samuel Albanie
Andrew Zisserman
CVBM
46
220
0
01 Apr 2018
Learning to Localize Sound Source in Visual Scenes
Learning to Localize Sound Source in Visual Scenes
Arda Senocak
Tae-Hyun Oh
Junsik Kim
Ming-Hsuan Yang
In So Kweon
SSL
55
344
0
10 Mar 2018
Objects that Sound
Objects that Sound
Relja Arandjelović
Andrew Zisserman
ObjD
VOS
68
529
0
18 Dec 2017
Visual to Sound: Generating Natural Sound for Videos in the Wild
Visual to Sound: Generating Natural Sound for Videos in the Wild
Yipin Zhou
Zhaowen Wang
Chen Fang
Trung Bui
Tamara L. Berg
VGen
47
206
0
04 Dec 2017
Supervised Speech Separation Based on Deep Learning: An Overview
Supervised Speech Separation Based on Deep Learning: An Overview
DeLiang Wang
Jitong Chen
SSL
48
1,359
0
24 Aug 2017
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Seeing Through Noise: Visually Driven Speaker Separation and Enhancement
Aviv Gabbay
Ariel Ephrat
Tavi Halperin
Shmuel Peleg
54
19
0
22 Aug 2017
Look, Listen and Learn
Look, Listen and Learn
Relja Arandjelović
Andrew Zisserman
SSL
78
897
0
23 May 2017
Neural Face Editing with Intrinsic Image Disentangling
Neural Face Editing with Intrinsic Image Disentangling
Zhixin Shu
Ersin Yumer
Sunil Hadap
Kalyan Sunkavalli
Eli Shechtman
Dimitris Samaras
CVBM
DRL
GAN
73
285
0
13 Apr 2017
Colorization as a Proxy Task for Visual Understanding
Colorization as a Proxy Task for Visual Understanding
Gustav Larsson
Michael Maire
Gregory Shakhnarovich
SSL
128
494
0
11 Mar 2017
Creating A Multi-track Classical Musical Performance Dataset for
  Multimodal Music Analysis: Challenges, Insights, and Applications
Creating A Multi-track Classical Musical Performance Dataset for Multimodal Music Analysis: Challenges, Insights, and Applications
Bochen Li
Xinzhao Liu
K. Dinesh
Z. Duan
Gaurav Sharma
108
150
0
27 Dec 2016
Learning Features by Watching Objects Move
Learning Features by Watching Objects Move
Deepak Pathak
Ross B. Girshick
Piotr Dollár
Trevor Darrell
Bharath Hariharan
SSL
VOS
OCL
60
523
0
19 Dec 2016
SoundNet: Learning Sound Representations from Unlabeled Video
SoundNet: Learning Sound Representations from Unlabeled Video
Y. Aytar
Carl Vondrick
Antonio Torralba
SSL
80
1,040
0
27 Oct 2016
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
86
2,488
0
29 Sep 2016
Generating Videos with Scene Dynamics
Generating Videos with Scene Dynamics
Carl Vondrick
Hamed Pirsiavash
Antonio Torralba
GAN
VGen
140
1,464
0
08 Sep 2016
Context Encoders: Feature Learning by Inpainting
Context Encoders: Feature Learning by Inpainting
Deepak Pathak
Philipp Krahenbuhl
Jeff Donahue
Trevor Darrell
Alexei A. Efros
SSL
46
5,277
0
25 Apr 2016
Visually Indicated Sounds
Visually Indicated Sounds
Andrew Owens
Phillip Isola
Josh H. McDermott
Antonio Torralba
Edward H. Adelson
William T. Freeman
72
382
0
28 Dec 2015
Learning Deep Features for Discriminative Localization
Learning Deep Features for Discriminative Localization
Bolei Zhou
A. Khosla
Àgata Lapedriza
A. Oliva
Antonio Torralba
SSL
SSeg
FAtt
159
9,266
0
14 Dec 2015
Deep Residual Learning for Image Recognition
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.3K
192,638
0
10 Dec 2015
Deep clustering: Discriminative embeddings for segmentation and
  separation
Deep clustering: Discriminative embeddings for segmentation and separation
J. Hershey
Zhuo Chen
Jonathan Le Roux
Shinji Watanabe
44
1,316
0
18 Aug 2015
Unsupervised Visual Representation Learning by Context Prediction
Unsupervised Visual Representation Learning by Context Prediction
Carl Doersch
Abhinav Gupta
Alexei A. Efros
DRL
SSL
145
2,777
0
19 May 2015
U-Net: Convolutional Networks for Biomedical Image Segmentation
U-Net: Convolutional Networks for Biomedical Image Segmentation
Olaf Ronneberger
Philipp Fischer
Thomas Brox
SSeg
3DV
1.1K
76,547
0
18 May 2015
Learning image representations tied to ego-motion
Learning image representations tied to ego-motion
Dinesh Jayaraman
Kristen Grauman
SSL
63
245
0
08 May 2015
Deep Karaoke: Extracting Vocals from Musical Mixtures Using a
  Convolutional Deep Neural Network
Deep Karaoke: Extracting Vocals from Musical Mixtures Using a Convolutional Deep Neural Network
Andrew J. R. Simpson
Gerard Roma
Mark D. Plumbley
53
102
0
17 Apr 2015
1