Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.05734
Cited By
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
10 August 2023
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining"
50 / 168 papers shown
Title
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Zehan Wang
Ke Lei
Chen Zhu
Jiawei Huang
Sashuai Zhou
...
Xize Cheng
Shengpeng Ji
Zhenhui Ye
Tao Jin
Zhou Zhao
29
0
0
15 May 2025
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Riccardo Passoni
Francesca Ronchini
Luca Comanducci
Romain Serizel
Fabio Antonacci
DiffM
38
0
0
12 May 2025
MusFlow: Multimodal Music Generation via Conditional Flow Matching
Jiahao Song
Yuzhao Wang
37
0
0
18 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
69
0
0
17 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
31
0
0
12 Apr 2025
TARO: Timestep-Adaptive Representation Alignment with Onset-Aware Conditioning for Synchronized Video-to-Audio Synthesis
Tri Ton
Ji Woo Hong
Chang D. Yoo
VGen
24
0
0
08 Apr 2025
Policy Optimization Algorithms in a Unified Framework
Shuang Wu
39
0
0
04 Apr 2025
Random Conditioning with Distillation for Data-Efficient Diffusion Model Compression
Dohyun Kim
S. Park
Geonhee Han
Seung Wook Kim
Paul Hongsuck Seo
DiffM
58
0
0
02 Apr 2025
FreSca: Unveiling the Scaling Space in Diffusion Models
Chao Huang
Susan Liang
Yunlong Tang
Li Ma
Yapeng Tian
Chenliang Xu
DiffM
48
1
0
02 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
Visual Acoustic Fields
Yuelei Li
Hyunjin Kim
Fangneng Zhan
Ri-Zhao Qiu
Mazeyu Ji
Xiaojun Shan
Xueyan Zou
Paul Liang
Hanspeter Pfister
Xiaolong Wang
47
0
0
31 Mar 2025
CA^2ST: Cross-Attention in Audio, Space, and Time for Holistic Video Recognition
Jongseo Lee
Joohyun Chang
Dongho Lee
Jinwoo Choi
56
0
0
30 Mar 2025
DeepAudio-V1:Towards Multi-Modal Multi-Stage End-to-End Video to Speech and Audio Generation
Haomin Zhang
Chang Liu
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
DiffM
VGen
88
0
0
28 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
31
0
0
28 Mar 2025
Enhance Generation Quality of Flow Matching V2A Model via Multi-Step CoT-Like Guidance and Combined Preference Optimization
Haomin Zhang
Shri Kiran Srinivasan
Haoyu Wang
Zihao Chen
X. Liu
Chaofan Ding
Xinhan Di
34
0
0
28 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
89
0
0
26 Mar 2025
Imagine to Hear: Auditory Knowledge Generation can be an Effective Assistant for Language Models
Suho Yoo
Hyunjong Ok
Jaeho Lee
AuLLM
RALM
51
0
0
21 Mar 2025
Aligning Text-to-Music Evaluation with Human Preferences
Yichen Huang
Zachary Novack
Koichi Saito
Jiatong Shi
Shinji Watanabe
Yuki Mitsufuji
John Thickstun
Chris Donahue
EGVM
70
1
0
20 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
FilmComposer: LLM-Driven Music Production for Silent Film Clips
Zhifeng Xie
Qile He
Youjia Zhu
Qiwei He
Mengtian Li
VGen
103
2
0
11 Mar 2025
Synchronized Video-to-Audio Generation via Mel Quantization-Continuum Decomposition
Juncheng Wang
Chao Xu
Cheng Yu
Lei Shang
Zhe Hu
Shujun Wang
Liefeng Bo
DiffM
VGen
48
0
0
10 Mar 2025
ReelWave: A Multi-Agent Framework Toward Professional Movie Sound Generation
Zixuan Wang
Chi-Keung Tang
Yu-Wing Tai
DiffM
VGen
63
0
0
10 Mar 2025
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio
Xuenan Xu
Jiahao Mei
Chenliang Li
Yuning Wu
M. Yan
Shaopeng Lai
J.N. Zhang
Mengyue Wu
VGen
LLMAG
44
1
0
07 Mar 2025
Language Model Mapping in Multimodal Music Learning: A Grand Challenge Proposal
Daniel Y. Chin
Gus Xia
36
0
0
01 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
168
0
0
01 Mar 2025
InspireMusic: Integrating Super Resolution and Large Language Model for High-Fidelity Long-Form Music Generation
C. Zhang
Yukun Ma
Qian Chen
Wen Wang
Shengkui Zhao
...
Y. Jiang
Chaohong Tan
Zhifu Gao
Zhihao Du
B. Ma
55
0
0
28 Feb 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Xiao-Lei Zhang
Xuelong Li
DiffM
MDE
68
1
0
26 Feb 2025
GCDance: Genre-Controlled 3D Full Body Dance Generation Driven By Music
Xinran Liu
Xu Dong
Diptesh Kanojia
Wenwu Wang
Zhenhua Feng
DiffM
62
0
0
25 Feb 2025
KAD: No More FAD! An Effective and Efficient Evaluation Metric for Audio Generation
Yoonjin Chung
Pilsun Eu
Junwon Lee
Keunwoo Choi
Juhan Nam
Ben Sangbae Chon
EGVM
62
3
0
21 Feb 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Ziqiang Liu
Shuangrui Ding
Zhixiong Zhang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
81
0
0
18 Feb 2025
XAttnMark: Learning Robust Audio Watermarking with Cross-Attention
Yong-Jin Liu
Lie Lu
Jihui Jin
Lichao Sun
Andrea Fanelli
98
1
0
06 Feb 2025
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Siyuan Hou
Shansong Liu
Ruibin Yuan
Wei Xue
Ying Shan
Mangsuo Zhao
Chao Zhang
87
3
0
17 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
74
7
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Yi Yuan
Xubo Liu
Haohe Liu
Mark D. Plumbley
Wenwu Wang
52
3
0
10 Jan 2025
Generative AI for Cel-Animation: A Survey
Yunlong Tang
Junjia Guo
Pinxin Liu
Zhiyuan Wang
Hang Hua
...
Jing Bi
Mingqian Feng
Xuzhao Li
Zeliang Zhang
Chenliang Xu
VGen
90
7
0
08 Jan 2025
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
Yi Yuan
Dongya Jia
Xiaobin Zhuang
Yuanzhe Chen
Zhengxi Liu
...
Yansen Wang
Xubo Liu
Xiyuan Kang
Mark D. Plumbley
Wenwu Wang
VLM
55
4
0
03 Jan 2025
LoVA: Long-form Video-to-Audio Generation
Xin Cheng
Xihua Wang
Yihan Wu
Yuyue Wang
Ruihua Song
VGen
DiffM
48
3
0
31 Dec 2024
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
A. Schwing
Yuki Mitsufuji
VGen
126
12
0
19 Dec 2024
VinTAGe: Joint Video and Text Conditioning for Holistic Audio Generation
Saksham Singh Kushwaha
Yapeng Tian
DiffM
VGen
87
2
0
14 Dec 2024
Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise
Tornike Karchkhadze
Keren Shao
Shlomo Dubnov
75
0
0
12 Dec 2024
SyncFlow: Toward Temporally Aligned Joint Audio-Video Generation from Text
Haohe Liu
Gaël Le Lan
Xinhao Mei
Zhaoheng Ni
Anurag Kumar
Varun K. Nagaraja
Wenwu Wang
Mark D. Plumbley
Yangyang Shi
Vikas Chandra
VGen
64
1
0
03 Dec 2024
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
95
5
0
02 Dec 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
77
9
0
29 Nov 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
90
3
0
23 Nov 2024
Scaling Concept With Text-Guided Diffusion Models
Chao Huang
Susan Liang
Yunlong Tang
Yapeng Tian
Anurag Kumar
Chenliang Xu
DiffM
56
6
0
31 Oct 2024
MusicFlow: Cascaded Flow Matching for Text Guided Music Generation
K R Prajwal
Bowen Shi
Matthew Lee
Apoorv Vyas
Andros Tjandra
...
Baishan Guo
Huiyu Wang
Triantafyllos Afouras
David Kant
Wei-Ning Hsu
43
5
0
27 Oct 2024
Challenge on Sound Scene Synthesis: Evaluating Text-to-Audio Generation
Junwon Lee
Modan Tailleur
Laurie M. Heller
Keunwoo Choi
Mathieu Lagrange
Brian McFee
Keisuke Imoto
Yuki Okamoto
20
4
0
23 Oct 2024
Construction and Analysis of Impression Caption Dataset for Environmental Sounds
Yuki Okamoto
Ryotaro Nagase
Minami Okamoto
Yuki Saito
Keisuke Imoto
Takahiro Fukumori
Y. Yamashita
26
0
0
20 Oct 2024
1
2
3
4
Next