Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1512.08512
Cited By
Visually Indicated Sounds
28 December 2015
Andrew Owens
Phillip Isola
Josh H. McDermott
Antonio Torralba
Edward H. Adelson
William T. Freeman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visually Indicated Sounds"
50 / 206 papers shown
Title
Template-Free Try-on Image Synthesis via Semantic-guided Optimization
Chien-Lung Chou
Chieh-Yun Chen
Chia-Wei Hsieh
Hong-Han Shuai
Jiaying Liu
Wen-Huang Cheng
3DH
29
14
0
06 Feb 2021
ACAV100M: Automatic Curation of Large-Scale Datasets for Audio-Visual Video Representation Learning
Sangho Lee
Jiwan Chung
Youngjae Yu
Gunhee Kim
Thomas Breuel
Gal Chechik
Yale Song
71
45
0
26 Jan 2021
Learning rich touch representations through cross-modal self-supervision
Martina Zambelli
Y. Aytar
Francesco Visin
Yuxiang Zhou
R. Hadsell
SSL
34
16
0
21 Jan 2021
VisualVoice: Audio-Visual Speech Separation with Cross-Modal Consistency
Ruohan Gao
Kristen Grauman
CVBM
196
199
0
08 Jan 2021
Audio-Visual Floorplan Reconstruction
Senthil Purushwalkam
S. V. A. Garí
V. Ithapu
Carl Schissler
Philip Robinson
Abhinav Gupta
Kristen Grauman
VGen
3DV
65
41
0
31 Dec 2020
Parameter Efficient Multimodal Transformers for Video Representation Learning
Sangho Lee
Youngjae Yu
Gunhee Kim
Thomas Breuel
Jan Kautz
Yale Song
ViT
29
76
0
08 Dec 2020
Multi-Instrumentalist Net: Unsupervised Generation of Music from Body Movements
Kun Su
Xiulong Liu
Eli Shlizerman
32
28
0
07 Dec 2020
Sound Synthesis, Propagation, and Rendering: A Survey
Shiguang Liu
Tianyi Zhou
30
26
0
11 Nov 2020
Multi-Modal Learning of Keypoint Predictive Models for Visual Object Manipulation
Sarah Bechtle
Neha Das
Franziska Meier
SSL
24
4
0
08 Nov 2020
Listening to Sounds of Silence for Speech Denoising
Ruilin Xu
Rundi Wu
Y. Ishiwaka
Carl Vondrick
Changxi Zheng
28
32
0
22 Oct 2020
Audio-Visual Event Localization via Recursive Fusion by Joint Co-Attention
Bin Duan
Hao Tang
Wei Wang
Ziliang Zong
Guowei Yang
Yan Yan
33
59
0
14 Aug 2020
Self-Supervised Learning of Audio-Visual Objects from Video
Triantafyllos Afouras
Andrew Owens
Joon Son Chung
Andrew Zisserman
SSL
19
253
0
10 Aug 2020
Self-supervised Learning of Point Clouds via Orientation Estimation
Omid Poursaeed
Tianxing Jiang
Quintessa Qiao
N. Xu
Vladimir G. Kim
3DPC
SSL
16
116
0
01 Aug 2020
Noisy Agents: Self-supervised Exploration by Predicting Auditory Events
Chuang Gan
Xiaoyu Chen
Phillip Isola
Antonio Torralba
J. Tenenbaum
13
7
0
27 Jul 2020
Sound2Sight: Generating Visual Dynamics from Sound and Context
A. Cherian
Moitreya Chatterjee
Narendra Ahuja
VGen
77
35
0
23 Jul 2020
Foley Music: Learning to Generate Music from Videos
Chuang Gan
Deng Huang
Peihao Chen
J. Tenenbaum
Antonio Torralba
VGen
20
136
0
21 Jul 2020
Sep-Stereo: Visually Guided Stereophonic Audio Generation by Associating Source Separation
Hang Zhou
Xudong Xu
Dahua Lin
Xiaogang Wang
Ziwei Liu
DiffM
32
81
0
20 Jul 2020
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
24
95
0
14 Jul 2020
Do We Need Sound for Sound Source Localization?
Takashi Oya
Shohei Iwase
Ryota Natsume
Takahiro Itazuri
Shugo Yamaguchi
Shigeo Morishima
11
21
0
11 Jul 2020
Swoosh! Rattle! Thump! -- Actions that Sound
Dhiraj Gandhi
Abhinav Gupta
Lerrel Pinto
27
39
0
03 Jul 2020
Space-Time Correspondence as a Contrastive Random Walk
Allan Jabri
Andrew Owens
Alexei A. Efros
SSL
OT
28
292
0
25 Jun 2020
AVLnet: Learning Audio-Visual Language Representations from Instructional Videos
Andrew Rouditchenko
Angie Boggust
David Harwath
Brian Chen
D. Joshi
...
Rogerio Feris
Brian Kingsbury
M. Picheny
Antonio Torralba
James R. Glass
SSL
22
141
0
16 Jun 2020
Telling Left from Right: Learning Spatial Correspondence of Sight and Sound
Karren D. Yang
Bryan C. Russell
Justin Salamon
SSL
24
75
0
11 Jun 2020
VisualEchoes: Spatial Image Representation Learning through Echolocation
Ruohan Gao
Changan Chen
Ziad Al-Halah
Carl Schissler
Kristen Grauman
MDE
SSL
171
84
0
04 May 2020
Teaching Cameras to Feel: Estimating Tactile Physical Properties of Surfaces From Images
Matthew Purri
Kristin J. Dana
22
15
0
29 Apr 2020
VGGSound: A Large-scale Audio-Visual Dataset
Honglie Chen
Weidi Xie
Andrea Vedaldi
Andrew Zisserman
17
556
0
29 Apr 2020
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
31
31
0
06 Apr 2020
Deep Multimodal Feature Encoding for Video Ordering
Vivek Sharma
Makarand Tapaswi
Rainer Stiefelhagen
33
10
0
05 Apr 2020
Semantic Object Prediction and Spatial Sound Super-Resolution with Binaural Sounds
A. Vasudevan
Dengxin Dai
Luc Van Gool
ObjD
10
42
0
09 Mar 2020
Noise Estimation Using Density Estimation for Self-Supervised Multimodal Learning
Elad Amrani
Rami Ben-Ari
Daniel Rotman
A. Bronstein
19
121
0
06 Mar 2020
AutoFoley: Artificial Synthesis of Synchronized Sound Tracks for Silent Videos with Deep Learning
Sanchita Ghose
John J. Prevost
VGen
22
46
0
21 Feb 2020
Unsupervised Learning of Audio Perception for Robotics Applications: Learning to Project Data to T-SNE/UMAP space
Prateek Verma
J. Salisbury
SSL
9
4
0
10 Feb 2020
ImVoteNet: Boosting 3D Object Detection in Point Clouds with Image Votes
C. Qi
Xinlei Chen
Or Litany
Leonidas J. Guibas
3DPC
197
249
0
29 Jan 2020
Deep Audio-Visual Learning: A Survey
Hao Zhu
Mandi Luo
Rui Wang
A. Zheng
Ran He
31
156
0
14 Jan 2020
SoundSpaces: Audio-Visual Navigation in 3D Environments
Changan Chen
Unnat Jain
Carl Schissler
S. V. A. Garí
Ziad Al-Halah
V. Ithapu
Philip Robinson
Kristen Grauman
29
26
0
24 Dec 2019
Listen to Look: Action Recognition by Previewing Audio
Ruohan Gao
Tae-Hyun Oh
Kristen Grauman
Lorenzo Torresani
VLM
29
251
0
10 Dec 2019
Self-labelling via simultaneous clustering and representation learning
Yuki M. Asano
Christian Rupprecht
Andrea Vedaldi
SSL
48
762
0
13 Nov 2019
Dancing to Music
Hsin-Ying Lee
Xiaodong Yang
Xuan Li
Ting-Chun Wang
Yu-Ding Lu
Ming-Hsuan Yang
Jan Kautz
27
15
0
05 Nov 2019
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
35
88
0
24 Oct 2019
Seeing and Hearing Egocentric Actions: How Much Can We Learn?
Alejandro Cartas
Jordi Luque
Petia Radeva
Carlos Segura
Mariella Dimiccoli
EgoV
24
20
0
15 Oct 2019
Towards Learning a Self-inverse Network for Bidirectional Image-to-image Translation
ZENGMING SHEN
Yifan Chen
S. Kevin Zhou
Bogdan Georgescu
Xuqi Liu
Thomas S. Huang
SSL
MedIm
21
1
0
09 Sep 2019
Neural Re-Simulation for Generating Bounces in Single Images
Carlo Innamorati
Bryan C. Russell
D. Kaufman
and Niloy J. Mitra
VGen
30
11
0
17 Aug 2019
Making Sense of Vision and Touch: Learning Multimodal Representations for Contact-Rich Tasks
Michelle A. Lee
Yuke Zhu
Peter Zachares
Matthew Tan
K. Srinivasan
Silvio Savarese
Fei-Fei Li
Animesh Garg
Jeannette Bohg
SSL
23
208
0
28 Jul 2019
Human detection of machine manipulated media
Matthew Groh
Ziv Epstein
Nick Obradovich
Manuel Cebrian
Iyad Rahwan
20
21
0
06 Jul 2019
Connecting Touch and Vision via Cross-Modal Prediction
Yunzhu Li
Jun-Yan Zhu
Russ Tedrake
Antonio Torralba
15
133
0
14 Jun 2019
Contrastive Multiview Coding
Yonglong Tian
Dilip Krishnan
Phillip Isola
SSL
100
2,372
0
13 Jun 2019
Learning Video Representations using Contrastive Bidirectional Transformer
Chen Sun
Fabien Baradel
Kevin Patrick Murphy
Cordelia Schmid
SSL
ViT
27
133
0
13 Jun 2019
Online Object Representations with Contrastive Learning
Soren Pirk
Mohi Khansari
Yunfei Bai
Corey Lynch
P. Sermanet
SSL
28
35
0
10 Jun 2019
How Much Does Audio Matter to Recognize Egocentric Object Interactions?
Alejandro Cartas
Jordi Luque
Petia Radeva
Carlos Segura
Mariella Dimiccoli
EgoV
20
6
0
03 Jun 2019
Speech2Face: Learning the Face Behind a Voice
Tae-Hyun Oh
Tali Dekel
Changil Kim
Inbar Mosseri
William T. Freeman
Michael Rubinstein
Wojciech Matusik
SSL
CVBM
33
163
0
23 May 2019
Previous
1
2
3
4
5
Next