Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.08791
Cited By
Taming Visually Guided Sound Generation
17 October 2021
Vladimir E. Iashin
Esa Rahtu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Taming Visually Guided Sound Generation"
50 / 94 papers shown
Title
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim
Heeseung Yun
Gunhee Kim
VGen
13
2
0
13 Jun 2025
Hearing Hands: Generating Sounds from Physical Interactions in 3D Scenes
Yiming Dou
Wonseok Oh
Yuqing Luo
Antonio Loquercio
Andrew Owens
59
0
0
11 Jun 2025
A Review on Score-based Generative Models for Audio Applications
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffM
MedIm
29
0
0
10 Jun 2025
Average Calibration Losses for Reliable Uncertainty in Medical Image Segmentation
Theodore Barfoot
Luis C. Garcia-Peraza-Herrera
Samet Akcay
Ben Glocker
Tom Vercauteren
UQCV
127
0
0
04 Jun 2025
Towards Video to Piano Music Generation with Chain-of-Perform Support Benchmarks
Chang Liu
Haomin Zhang
Shiyu Xia
Zihao Chen
Chaofan Ding
Xin Yue
Huizhe Chen
Xinhan Di
48
0
0
26 May 2025
SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet
Zhi-Wei Zhong
Akira Takahashi
Shuyang Cui
Keisuke Toyama
Shusuke Takahashi
Yuki Mitsufuji
VGen
45
0
0
22 May 2025
Sat2Sound: A Unified Framework for Zero-Shot Soundscape Mapping
Subash Khanal
Srikumar Sastry
Aayush Dhakal
Adeel Ahmad
Nathan Jacobs
76
0
0
19 May 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
113
1
0
21 Apr 2025
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton
Ji Woo Hong
Chang D. Yoo
VGen
67
0
0
08 Apr 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Haomin Zhang
Siyang Song
Haoyu Wang
Zihao Chen
Xianglong Liu
Chaofan Ding
Xinhan Di
75
0
0
28 Mar 2025
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
Yunming Liang
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
109
0
0
28 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
152
0
0
28 Mar 2025
Chirp Localization via Fine-Tuned Transformer Model: A Proof-of-Concept Study
N. Bahador
M. Lankarany
119
0
0
24 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
104
1
0
15 Mar 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffM
VGen
103
1
0
13 Mar 2025
TA-V2A: Textually Assisted Video-to-Audio Generation
Yuhuan You
Xihong Wu
T. Qu
DiffM
105
0
0
12 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
94
0
0
10 Mar 2025
ReelWave: Multi-Agentic Movie Sound Generation through Multimodal LLM Conversation
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
VGen
DiffM
127
0
0
10 Mar 2025
Generative AI for Cel-Animation: A Survey
Yunlong Tang
Junjia Guo
Pinxin Liu
Zhiyuan Wang
Hang Hua
...
Jing Bi
Mingqian Feng
Xuzhao Li
Zeliang Zhang
Chenliang Xu
VGen
155
7
0
08 Jan 2025
LoVA: Long-form Video-to-Audio Generation
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGen
DiffM
112
3
0
31 Dec 2024
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance
Yaoyun Zhang
Xuenan Xu
Mengyue Wu
VGen
97
1
0
24 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
285
18
0
19 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
125
2
0
14 Dec 2024
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
203
0
0
09 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
157
1
0
03 Dec 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
173
5
0
23 Nov 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
121
9
0
16 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
109
4
0
04 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
93
3
0
03 Oct 2024
From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
158
7
0
27 Sep 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
161
4
0
26 Sep 2024
Self-Supervised Audio-Visual Soundscape Stylization
Tingle Li
Renhao Wang
Po-Yao Huang
Andrew Owens
Gopala Anumanchipalli
DiffM
SSL
89
5
0
22 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
79
13
0
20 Sep 2024
Efficient Video to Audio Mapper with Visual Scene Detection
Mingjing Yi
Ming Li
VGen
96
3
0
15 Sep 2024
Text Prompt is Not Enough: Sound Event Enhanced Prompt Adapter for Target Style Audio Generation
Chenxu Xiong
Ruibo Fu
Shuchen Shi
Zhengqi Wen
Jianhua Tao
...
Chunyu Qiang
Yuankun Xie
Xin Qi
Guanjun Li
Zizheng Yang
DiffM
82
0
0
14 Sep 2024
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Zhiqi Huang
Dan Luo
Jun Wang
Huan Liao
Zhiheng Li
Zhiyong Wu
VGen
86
4
0
13 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
99
9
0
13 Sep 2024
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
VGen
75
8
0
11 Sep 2024
Draw an Audio: Leveraging Multi-Instruction for Video-to-Audio Synthesis
Qi Yang
Binjie Mao
Zili Wang
Xing Nie
Pengfei Gao
Ying Guo
Cheng Zhen
Pengfei Yan
Shiming Xiang
VGen
DiffM
86
5
0
10 Sep 2024
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
Junwon Lee
Jaekwon Im
Dabin Kim
Juhan Nam
VGen
129
10
0
21 Aug 2024
PSM: Learning Probabilistic Embeddings for Multi-scale Zero-Shot Soundscape Mapping
Subash Khanal
Eric Xing
Srikumar Sastry
Aayush Dhakal
Zhexiao Xiong
Adeel Ahmad
Nathan Jacobs
97
3
0
13 Aug 2024
EgoSonics: Generating Synchronized Audio for Silent Egocentric Videos
Aashish Rai
Srinath Sridhar
DiffM
75
4
0
30 Jul 2024
MEDIC: Zero-shot Music Editing with Disentangled Inversion Control
Huadai Liu
Jialei Wang
Rongjie Huang
Yang Liu
Jiayang Xu
Zhou Zhao
71
4
0
18 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
95
16
0
15 Jul 2024
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
111
12
0
10 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
84
15
0
08 Jul 2024
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang
Yicheng Gu
Yanhong Zeng
Zhening Xing
Yuancheng Wang
Zhizheng Wu
Kai Chen
VGen
100
41
0
01 Jul 2024
SpecMaskGIT: Masked Generative Modeling of Audio Spectrograms for Efficient Audio Synthesis and Beyond
Marco Comunità
Zhi-Wei Zhong
Akira Takahashi
Shiqi Yang
Mengjie Zhao
Koichi Saito
Yukara Ikemiya
Takashi Shibuya
Shusuke Takahashi
Yuki Mitsufuji
110
6
0
25 Jun 2024
Action2Sound: Ambient-Aware Generation of Action Sounds from Egocentric Videos
Changan Chen
Puyuan Peng
Ami Baid
Zihui Xue
Wei-Ning Hsu
David Harwath
Kristen Grauman
VGen
80
8
0
13 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
136
6
0
06 Jun 2024
1
2
Next