Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1512.08512
Cited By
Visually Indicated Sounds
28 December 2015
Andrew Owens
Phillip Isola
Josh H. McDermott
Antonio Torralba
Edward H. Adelson
William T. Freeman
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visually Indicated Sounds"
50 / 206 papers shown
Title
SounDiT: Geo-Contextual Soundscape-to-Landscape Generation
Junbo Wang
Haofeng Tan
Bowen Liao
Albert Jiang
Teng Fei
Qixing Huang
Zhengzhong Tu
Shan Ye
Yuhao Kang
22
0
0
19 May 2025
Learning to Highlight Audio by Watching Movies
Chao Huang
Ruohan Gao
J. M. F. Tsang
Jan Kurcius
Cagdas Bilen
Chenliang Xu
Anurag Kumar
Sanjeel Parekh
VGen
12
0
0
17 May 2025
HapticVLM: VLM-Driven Texture Recognition Aimed at Intelligent Haptic Interaction
Muhammad Haris Khan
Miguel Altamirano Cabrera
Dmitrii Iarchuk
Yara Mahmoud
Daria Trinitatova
Issatay Tokmurziyev
Dzmitry Tsetserukou
VLM
48
0
0
05 May 2025
KeyVID: Keyframe-Aware Video Diffusion for Audio-Synchronized Visual Animation
Xingrui Wang
Jiang-Long Liu
Zhilin Wang
Xiaodong Yu
Jialian Wu
Xingchen Sun
Yusheng Su
Alan Yuille
Zicheng Liu
Emad Barsoum
DiffM
VGen
51
0
0
13 Apr 2025
Visual Acoustic Fields
Yuelei Li
Hyunjin Kim
Fangneng Zhan
Ri-Zhao Qiu
Mazeyu Ji
Xiaojun Shan
Xueyan Zou
Paul Liang
Hanspeter Pfister
Xiaolong Wang
47
0
0
31 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Yu Guo
67
3
0
13 Mar 2025
TextToucher: Fine-Grained Text-to-Touch Generation
Jiahang Tu
Hao Fu
Fengyu Yang
Hanbin Zhao
Chao Zhang
Hui Qian
VLM
DiffM
91
9
0
10 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
Learning Self-Supervised Audio-Visual Representations for Sound Recommendations
Sudha Krishnamurthy
SSL
80
1
0
10 Dec 2024
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
83
0
0
09 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
64
1
0
03 Dec 2024
AV-Odyssey Bench: Can Your Multimodal LLMs Really Understand Audio-Visual Information?
Kaixiong Gong
Kaituo Feng
Yangqiu Song
Yibing Wang
Mofan Cheng
...
Jiaming Han
Benyou Wang
Yutong Bai
Zheng Yang
Xiangyu Yue
MLLM
AuLLM
VLM
91
6
0
03 Dec 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
93
3
0
23 Nov 2024
ANAVI: Audio Noise Awareness using Visuals of Indoor environments for NAVIgation
Vidhi Jain
Rishi Veerapaneni
Yonatan Bisk
41
0
0
24 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
45
6
0
27 Sep 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
70
4
0
26 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
38
4
0
22 Sep 2024
Sketching With Your Voice: "Non-Phonorealistic" Rendering of Sounds via Vocal Imitation
Matthew Caren
Kartik Chandra
J. Tenenbaum
Jonathan Ragan-Kelley
Karima Ma
43
0
0
20 Sep 2024
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
VGen
34
7
0
11 Sep 2024
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Qi Yang
Binjie Mao
Zili Wang
Xing Nie
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VGen
DiffM
46
5
0
10 Sep 2024
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Junwon Lee
Jaekwon Im
Dabin Kim
Juhan Nam
VGen
40
9
0
21 Aug 2024
Disentangled Acoustic Fields For Multimodal Physical Scene Understanding
Jie Yin
Andrew F. Luo
Yilun Du
A. Cherian
Tim K. Marks
Jonathan Le Roux
Chuang Gan
52
0
0
16 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
47
15
0
15 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
34
12
0
08 Jul 2024
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang
Yicheng Gu
Yanhong Zeng
Zhening Xing
Yuancheng Wang
Zhizheng Wu
Kai Chen
VGen
37
37
0
01 Jul 2024
SonicSense: Object Perception from In-Hand Acoustic Vibration
Jiaxun Liu
Boyuan Chen
47
4
0
25 Jun 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen
Puyuan Peng
Ami Baid
Zihui Xue
Wei-Ning Hsu
David Harwath
Kristen Grauman
VGen
47
8
0
13 Jun 2024
Tactile-Augmented Radiance Fields
Yiming Dou
Fengyu Yang
Yi Liu
Antonio Loquercio
Andrew Owens
36
18
0
07 May 2024
SonicDiffusion: Audio-Driven Image Generation and Editing with Pretrained Diffusion Models
Burak Can Biner
Farrin Marouf Sofian
Umur Berkay Karakacs
Duygu Ceylan
Erkut Erdem
Aykut Erdem
23
8
0
01 May 2024
Guided Masked Self-Distillation Modeling for Distributed Multimedia Sensor Event Analysis
Masahiro Yasuda
Noboru Harada
Yasunori Ohishi
Shoichiro Saito
Akira Nakayama
Nobutaka Ono
36
3
0
12 Apr 2024
SoundingActions: Learning How Actions Sound from Narrated Egocentric Videos
Changan Chen
Kumar Ashutosh
Rohit Girdhar
David Harwath
Kristen Grauman
EgoV
SSL
33
7
0
08 Apr 2024
Audio-Synchronized Visual Animation
Lin Zhang
Shentong Mo
Yijing Zhang
Pedro Morgado
DiffM
48
20
0
08 Mar 2024
MultiPLY: A Multisensory Object-Centric Embodied Large Language Model in 3D World
Yining Hong
Zishuo Zheng
Peihao Chen
Yian Wang
Junyan Li
Chuang Gan
23
33
0
16 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
Cross-modal Generative Model for Visual-Guided Binaural Stereo Generation
Zhaojian Li
Bin Zhao
Yuan Yuan
38
3
0
13 Nov 2023
Unsupervised Object Localization in the Era of Self-Supervised ViTs: A Survey
Oriane Siméoni
Éloi Zablocki
Spyros Gidaris
Gilles Puy
Patrick Pérez
33
10
0
19 Oct 2023
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
29
21
0
19 Sep 2023
Learning to Taste: A Multimodal Wine Dataset
Thoranna Bender
Simon Moe Sorensen
A. Kashani
K. E. Hjorleifsson
Grethe Hyldig
Søren Hauberg
Serge Belongie
Frederik Warburg
CoGe
35
2
0
31 Aug 2023
AdVerb: Visually Guided Audio Dereverberation
Sanjoy Chowdhury
Sreyan Ghosh
Subhrajyoti Dasgupta
Anton Ratnarajah
Utkarsh Tyagi
Tianyi Zhou
30
11
0
23 Aug 2023
Example-Based Framework for Perceptually Guided Audio Texture Generation
Purnima Kamath
Chitralekha Gupta
L. Wyse
Suranga Nanayakkara
24
4
0
23 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
23
39
0
18 Aug 2023
Learning Spatial Features from Audio-Visual Correspondence in Egocentric Videos
Sagnik Majumder
Ziad Al-Halah
Kristen Grauman
SSL
EgoV
36
4
0
10 Jul 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo
Chuanhao Yan
Chenxu Hu
Hang Zhao
DiffM
34
80
0
29 Jun 2023
RealImpact: A Dataset of Impact Sound Fields for Real Objects
Samuel Clarke
Ruohan Gao
Mason Wang
M. Rau
Julia Xu
Jui-Hsien Wang
Doug L. James
Jiajun Wu
42
9
0
16 Jun 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serrà
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
22
19
0
16 Jun 2023
Assessing Language Disorders using Artificial Intelligence: a Paradigm Shift
C. Themistocleous
K. Tsapkini
Dimitrios Kokkinakis
21
0
0
31 May 2023
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
Kun Su
Judith Yue Li
Qingqing Huang
Dima Kuzmin
Joonseok Lee
...
Fei Sha
A. Jansen
Yu Wang
Mauro Verzetti
Timo I. Denk
VGen
39
12
0
11 May 2023
Environmental sound synthesis from vocal imitations and sound event labels
Yuki Okamoto
Keisuke Imoto
Shinnosuke Takamichi
Ryotaro Nagase
Takahiro Fukumori
Y. Yamashita
20
0
0
29 Apr 2023
Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
25
38
0
17 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
30
2
0
12 Apr 2023
1
2
3
4
5
Next