Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.08791
Cited By
Taming Visually Guided Sound Generation
17 October 2021
Vladimir E. Iashin
Esa Rahtu
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Taming Visually Guided Sound Generation"
44 / 94 papers shown
Title
MoLA: Motion Generation and Editing with Latent Diffusion Enhanced by Adversarial Training
Kengo Uchida
Takashi Shibuya
Yuhta Takida
Naoki Murata
Shusuke Takahashi
Shusuke Takahashi
Yuki Mitsufuji
VGen
137
5
0
04 Jun 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jia-Bin Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
VGen
DiffM
130
13
0
01 Jun 2024
Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI
Che Liu
Changde Du
Xiaoyu Chen
Huiguang He
67
2
0
29 May 2024
C3LLM: Conditional Multimodal Content Generation Using Large Language Models
Zixuan Wang
Qinkai Duan
Yu-Wing Tai
Chi-Keung Tang
109
3
0
25 May 2024
Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation
Shiqi Yang
Zhi-Wei Zhong
Mengjie Zhao
Shusuke Takahashi
Masato Ishii
Takashi Shibuya
Yuki Mitsufuji
89
4
0
23 May 2024
A Versatile Diffusion Transformer with Mixture of Noise Levels for Audiovisual Generation
Gwanghyun Kim
Alonso Martinez
Yu-Chuan Su
Brendan Jou
José Lezama
...
Lijun Yu
Lu Jiang
A. Jansen
Jacob Walker
Krishna Somandepalli
74
9
0
22 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
158
9
0
20 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
91
29
0
30 Apr 2024
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen
Guan’an Wang
Xiaowen Huang
Jitao Sang
VGen
63
8
0
25 Apr 2024
TAVGBench: Benchmarking Text to Audible-Video Generation
Yuxin Mao
Xuyang Shen
Jing Zhang
Zhen Qin
Jinxing Zhou
Mochu Xiang
Yiran Zhong
Yuchao Dai
77
12
0
22 Apr 2024
Text-to-Audio Generation Synchronized with Videos
Shentong Mo
Jing Shi
Yapeng Tian
DiffM
VGen
88
18
0
08 Mar 2024
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing
Yin-Yin He
Zeyue Tian
Xintao Wang
Qifeng Chen
116
57
0
27 Feb 2024
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Hila Manor
T. Michaeli
DiffM
100
29
0
15 Feb 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
61
2
0
09 Jan 2024
Controllable Music Production with Diffusion Models and Guidance Gradients
Mark Levy
Bruno Di Giorgi
Floris Weers
Angelos Katharopoulos
Tom Nickson
DiffM
117
23
0
01 Nov 2023
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
65
17
0
23 Oct 2023
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
88
23
0
19 Sep 2023
Retrieval-Augmented Text-to-Audio Generation
Yiitan Yuan
Haohe Liu
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
RALM
80
28
0
14 Sep 2023
DCTTS: Discrete Diffusion Model with Contrastive Learning for Text-to-speech Generation
Zhichao Wu
Qiulin Li
Sixing Liu
Qun Yang
67
3
0
13 Sep 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
110
43
0
18 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
128
246
0
10 Aug 2023
Diff-Foley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models
Simian Luo
Chuanhao Yan
Chenxu Hu
Hang Zhao
DiffM
105
83
0
29 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
108
10
0
17 Jun 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serrà
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
86
20
0
16 Jun 2023
The ObjectFolder Benchmark: Multisensory Learning with Neural and Real Objects
Ruohan Gao
Yiming Dou
Hao Li
Tanmay Agarwal
Jeannette Bohg
Yunzhu Li
Li Fei-Fei
Jiajun Wu
64
35
0
01 Jun 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
101
21
0
22 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
235
132
0
04 May 2023
Diverse and Vivid Sound Generation from Text Descriptions
Guangwei Li
Xuenan Xu
Lingfeng Dai
Mengyue Wu
K. Yu
95
4
0
03 May 2023
Conditional Generation of Audio from Video via Foley Analogies
Yuexi Du
Ziyang Chen
Justin Salamon
Bryan C. Russell
Andrew Owens
VGen
73
40
0
17 Apr 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
86
39
0
30 Mar 2023
Sounding Video Generator: A Unified Framework for Text-guided Sounding Video Generation
Jiawei Liu
Weining Wang
Sihan Chen
Xinxin Zhu
Qingbin Liu
DiffM
VGen
78
14
0
29 Mar 2023
A Survey on Audio Diffusion Models: Text To Speech Synthesis and Enhancement in Generative AI
Chenshuang Zhang
Chaoning Zhang
Sheng Zheng
Mengchun Zhang
Maryam Qamar
Sung-Ho Bae
In So Kweon
DiffM
MedIm
116
73
0
23 Mar 2023
Sound Localization from Motion: Jointly Learning Sound Direction and Camera Rotation
Ziyang Chen
Shengyi Qian
Andrew Owens
101
13
0
20 Mar 2023
Leveraging Pre-trained AudioLDM for Text to Sound Generation: A Benchmark Study
Yiitan Yuan
Haohe Liu
Jinhua Liang
Xubo Liu
Mark D. Plumbley
Wenwu Wang
52
10
0
07 Mar 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
89
102
0
31 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
233
344
0
30 Jan 2023
VarietySound: Timbre-Controllable Video to Sound Generation via Unsupervised Information Disentanglement
Chenye Cui
Yi Ren
Jinglin Liu
Rongjie Huang
Zhou Zhao
VGen
81
14
0
19 Nov 2022
I Hear Your True Colors: Image Guided Audio Generation
Roy Sheffer
Yossi Adi
VLM
80
76
0
06 Nov 2022
Learning in Audio-visual Context: A Review, Analysis, and New Perspective
Yake Wei
Di Hu
Yapeng Tian
Xuelong Li
127
55
0
20 Aug 2022
A Proposal for Foley Sound Synthesis Challenge
Keunwoo Choi
Sangshin Oh
Minsung Kang
Brian McFee
53
11
0
21 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
104
306
0
20 Jul 2022
Learning Visual Styles from Audio-Visual Associations
Tingle Li
Yichen Liu
Andrew Owens
Hang Zhao
DiffM
73
22
0
10 May 2022
Quantized GAN for Complex Music Generation from Dance Videos
Ye Zhu
Kyle Olszewski
Yuehua Wu
Panos Achlioptas
Menglei Chai
Yan Yan
Sergey Tulyakov
MGen
108
46
0
01 Apr 2022
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
59
26
0
20 Jul 2021
Previous
1
2