Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.11915
Cited By
Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound
21 August 2024
Junwon Lee
Jaekwon Im
Dabin Kim
Juhan Nam
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound"
13 / 13 papers shown
Title
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
176
15
0
19 Dec 2024
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
57
15
0
15 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
57
14
0
08 Jul 2024
PEAVS: Perceptual Evaluation of Audio-Visual Synchrony Grounded in Viewers' Opinion Scores
Lucas Goncalves
Prashant Mathur
Chandrashekhar Lavania
Metehan Cekic
Marcello Federico
Kyu J. Han
44
4
0
10 Apr 2024
A Demand-Driven Perspective on Generative Audio AI
Sangshin Oh
Minsung Kang
Hyeongi Moon
Keunwoo Choi
Ben Sangbae Chon
40
3
0
10 Jul 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
78
204
0
30 Mar 2023
Adding Conditional Control to Text-to-Image Diffusion Models
Lvmin Zhang
Anyi Rao
Maneesh Agrawala
AI4CE
77
4,015
1
10 Feb 2023
Taming Visually Guided Sound Generation
Vladimir E. Iashin
Esa Rahtu
VLM
61
125
0
17 Oct 2021
FoleyGAN: Visually Guided Generative Adversarial Network-Based Synchronous Sound Generation in Silent Videos
Sanchita Ghose
John J. Prevost
GAN
45
26
0
20 Jul 2021
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
111
1,068
0
21 Dec 2019
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset
João Carreira
Andrew Zisserman
206
7,961
0
22 May 2017
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
94
2,488
0
29 Sep 2016
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
311
7,361
0
12 Sep 2016
1