SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet

22 May 2025

Papers citing "SpecMaskFoley: Steering Pretrained Spectral Masked Generative Transformer Toward Synchronized Video-to-audio Synthesis via ControlNet"

4 / 4 papers shown

Title
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer Siyuan Hou Shansong Liu Ruibin Yuan Wei Xue Ying Shan Mangsuo Zhao Chao Zhang 143 6 0 17 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations Ruben Ciranni Emilian Postolache Giorgio Mariani Michele Mancusi Giorgio Fabbro Emanuele Rodolà Luca Cosmo 275 8 0 10 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis Ho Kei Cheng Masato Ishii Akio Hayakawa Takashi Shibuya Alex Schwing Yuki Mitsufuji VGen 285 18 0 19 Dec 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks Weihsiang Liao Yuhta Takida Yukara Ikemiya Zhi-Wei Zhong Chieh-Hsin Lai ... Stefan Uhlich Taketo Akama Woosung Choi Yuichiro Koyama Yuki Mitsufuji 215 1 0 02 Nov 2024