ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.01494
  4. Cited By
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized
  Sounds

FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds

1 July 2024
Yiming Zhang
Yicheng Gu
Yanhong Zeng
Zhening Xing
Yuancheng Wang
Zhizheng Wu
Kai Chen
    VGen
ArXivPDFHTML

Papers citing "FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds"

34 / 34 papers shown
Title
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Hearing from Silence: Reasoning Audio Descriptions from Silent Videos via Vision-Language Model
Yong Ren
Chenxing Li
Le Xu
Hao Gu
Duzhen Zhang
Yujie Chen
Manjie Xu
Ruibo Fu
Shan Yang
Dong Yu
LRM
7
0
0
19 May 2025
OmniAudio: Generating Spatial Audio from 360-Degree Video
OmniAudio: Generating Spatial Audio from 360-Degree Video
Huadai Liu
Tianyi Luo
Qikai Jiang
Kaicheng Luo
Peiwen Sun
...
Xin Li
Shiliang Zhang
Zhijie Yan
Zhou Zhao
Wei Xue
VGen
58
0
0
21 Apr 2025
Extending Visual Dynamics for Video-to-Music Generation
Extending Visual Dynamics for Video-to-Music Generation
Xiaohao Liu
Teng Tu
Yunshan Ma
Tat-Seng Chua
VGen
64
0
0
10 Apr 2025
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton
Ji Woo Hong
Chang D. Yoo
VGen
29
0
0
08 Apr 2025
Visual Acoustic Fields
Visual Acoustic Fields
Yuelei Li
Hyunjin Kim
Fangneng Zhan
Ri-Zhao Qiu
Mazeyu Ji
Xiaojun Shan
Xueyan Zou
Paul Liang
Hanspeter Pfister
Xiaolong Wang
47
0
0
31 Mar 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Haomin Zhang
Shri Kiran Srinivasan
Haoyu Wang
Zihao Chen
Xianglong Liu
Chaofan Ding
Xinhan Di
39
0
0
28 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
88
0
0
28 Mar 2025
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
DeepSound-V1: Start to Think Step-by-Step in the Audio Generation from Videos
Yunming Liang
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
60
0
0
28 Mar 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffM
VGen
53
0
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Yu Guo
67
3
0
13 Mar 2025
TA-V2A: Textually Assisted Video-to-Audio Generation
Yuhuan You
Xihong Wu
T. Qu
DiffM
50
0
0
12 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
48
0
0
10 Mar 2025
ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
DiffM
VGen
63
0
0
10 Mar 2025
LoVA: Long-form Video-to-Audio Generation
LoVA: Long-form Video-to-Audio Generation
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGen
DiffM
48
3
0
31 Dec 2024
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation
  Under Semantic Guidance
Smooth-Foley: Creating Continuous Sound for Video-to-Audio Generation Under Semantic Guidance
Yaoyun Zhang
Xuenan Xu
Mengyue Wu
VGen
36
0
0
24 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
87
2
0
14 Dec 2024
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal
  Latent Alignment
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
83
0
0
09 Dec 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
93
3
0
23 Nov 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic
  Synchronization
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
71
7
0
16 Oct 2024
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Diff-SAGe: End-to-End Spatial Audio Generation Using Diffusion Models
Saksham Singh Kushwaha
Jianbo Ma
Mark R. P. Thomas
Yapeng Tian
Avery Bruni
37
1
0
15 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
49
4
0
04 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
44
3
0
03 Oct 2024
Video-to-Audio Generation with Fine-grained Temporal Semantics
Video-to-Audio Generation with Fine-grained Temporal Semantics
Yuchen Hu
Yu Gu
Chenxing Li
Rilin Chen
Dong Yu
VGen
DiffM
29
1
0
23 Sep 2024
Temporally Aligned Audio for Video with Autoregression
Temporally Aligned Audio for Video with Autoregression
Ilpo Viertola
Vladimir E. Iashin
Esa Rahtu
VGen
50
11
0
20 Sep 2024
Efficient Video to Audio Mapper with Visual Scene Detection
Efficient Video to Audio Mapper with Visual Scene Detection
Mingjing Yi
Ming Li
VGen
32
3
0
15 Sep 2024
Effective Pre-Training of Audio Transformers for Sound Event Detection
Effective Pre-Training of Audio Transformers for Sound Event Detection
Florian Schmid
T. Morocutti
Francesco Foscarin
Jan Schluter
Paul Primus
Gerhard Widmer
ViT
33
2
0
14 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
49
1
0
14 Sep 2024
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In
  Video-to-Audio Synthesis
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Zhiqi Huang
Dan Luo
Jun Wang
Huan Liao
Zhiheng Li
Zhiyong Wu
VGen
53
4
0
13 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
48
7
0
13 Sep 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
12
0
10 Jul 2024
Audio-Synchronized Visual Animation
Audio-Synchronized Visual Animation
Lin Zhang
Shentong Mo
Yijing Zhang
Pedro Morgado
DiffM
45
20
0
08 Mar 2024
Text-to-Audio Generation using Instruction-Tuned LLM and Latent
  Diffusion Model
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
152
144
0
24 Apr 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
151
317
0
30 Jan 2023
1