Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.22053
Cited By
AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation
28 May 2025
Yan Rong
Jinting Wang
Shan Yang
Guangzhi Lei
Li Liu
DiffM
VGen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AudioGenie: A Training-Free Multi-Agent Framework for Diverse Multimodality-to-Multiaudio Generation"
36 / 36 papers shown
Title
Dopamine Audiobook: A Training-free MLLM Agent for Emotional and Human-like Audiobook Generation
Yan Rong
Shan Yang
Guangzhi Lei
Li Liu
68
2
0
15 Apr 2025
Long-Video Audio Synthesis with Multi-Agent Collaboration
Yehang Zhang
Xinli Xu
Xiaojie Xu
L. Liu
Yuxiao Chen
DiffM
VGen
86
1
0
13 Mar 2025
Audio-Reasoner: Improving Reasoning Capability in Large Audio Language Models
Zhifei Xie
Mingbao Lin
Ziqiang Liu
Pengcheng Wu
Shuicheng Yan
Chunyan Miao
AuLLM
OffRL
LRM
116
14
0
04 Mar 2025
Qwen2.5-VL Technical Report
S. Bai
Keqin Chen
Xuejing Liu
Jialin Wang
Wenbin Ge
...
Zesen Cheng
Hang Zhang
Zhibo Yang
Haiyang Xu
Junyang Lin
VLM
287
528
0
20 Feb 2025
SongGen: A Single Stage Auto-regressive Transformer for Text-to-Song Generation
Ziqiang Liu
Shuangrui Ding
Zhixiong Zhang
Xiaoyi Dong
Pan Zhang
Yuhang Zang
Yuhang Cao
Dahua Lin
Jiaqi Wang
109
3
0
18 Feb 2025
Meta Audiobox Aesthetics: Unified Automatic Quality Assessment for Speech, Music, and Sound
Andros Tjandra
Yi-Chiao Wu
Baishan Guo
John Hoffman
Brian Ellis
...
Matt Le
Nick Zacharov
Carleigh Wood
Ann Lee
Wei-Ning Hsu
173
16
0
07 Feb 2025
GVMGen: A General Video-to-Music Generation Model with Hierarchical Attentions
H. Zuo
W. You
Junxian Wu
Shihong Ren
Pei Chen
Mingxu Zhou
Yaojie Lu
Lingyun Sun
VGen
49
4
0
20 Jan 2025
MMAudio: Taming Multimodal Joint Training for High-Quality Video-to-Audio Synthesis
Ho Kei Cheng
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Alex Schwing
Yuki Mitsufuji
VGen
254
17
0
19 Dec 2024
V2SFlow: Video-to-Speech Generation with Speech Decomposition and Rectified Flow
Jeongsoo Choi
Ji-Hoon Kim
Jiajian Li
Joon Son Chung
Shujie Liu
74
3
0
29 Nov 2024
SPAgent: Adaptive Task Decomposition and Model Selection for General Video Generation and Editing
Rong-Cheng Tu
Wenhao Sun
Zhao Jin
Jingyi Liao
Jiaxing Huang
Dacheng Tao
VGen
DiffM
144
6
0
28 Nov 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
149
5
0
23 Nov 2024
MuVi: Video-to-Music Generation with Semantic Alignment and Rhythmic Synchronization
Ruiqi Li
Siqi Zheng
Xize Cheng
Ziang Zhang
Shengpeng Ji
Zhou Zhao
VGen
100
8
0
16 Oct 2024
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching
Yushen Chen
Zhikang Niu
Ziyang Ma
Keqi Deng
Chunhui Wang
Jian Zhao
Kai Yu
Xie Chen
116
83
0
09 Oct 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
74
9
0
13 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
77
7
0
09 Sep 2024
FireRedTTS: A Foundation Text-To-Speech Framework for Industry-Level Generative Speech Applications
Hao-Han Guo
Kun Liu
Fei-Yu Shen
Yi-Chen Wu
Xu Tang
Kun Xie
Kai-Tuo Xu
Kun Xie
Kai-Tuo Xu
75
26
0
05 Sep 2024
Seeing Your Speech Style: A Novel Zero-Shot Identity-Disentanglement Face-based Voice Conversion
Yan Rong
Li Liu
48
5
0
01 Sep 2024
VoxInstruct: Expressive Human Instruction-to-Speech Generation with Unified Multilingual Codec Language Modelling
Yixuan Zhou
Xiaoyu Qin
Zeyu Jin
Shuoyi Zhou
Shun Lei
Songtao Zhou
Zhiyong Wu
Jia Jia
AuLLM
95
8
0
28 Aug 2024
Anim-Director: A Large Multimodal Model Powered Agent for Controllable Animation Video Generation
Yunxin Li
Haoyuan Shi
Baotian Hu
Longyue Wang
Jiashun Zhu
Jinyi Xu
Zhen Zhao
Min Zhang
VGen
71
9
0
19 Aug 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
105
34
0
08 Jul 2024
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds
Yiming Zhang
Yicheng Gu
Yanhong Zeng
Zhening Xing
Yuancheng Wang
Zhizheng Wu
Kai Chen
VGen
65
41
0
01 Jul 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
119
5
0
06 Jun 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jia-Bin Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
VGen
DiffM
67
13
0
01 Jun 2024
Diff-BGM: A Diffusion Model for Video Background Music Generation
Sizhe Li
Yiming Qin
Minghang Zheng
Xin Jin
Yang Liu
DiffM
29
13
0
20 May 2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Navonil Majumder
Chia-Yu Hung
Deepanway Ghosal
Wei-Ning Hsu
Rada Mihalcea
Soujanya Poria
65
58
0
15 Apr 2024
Text-to-Song: Towards Controllable Music Generation Incorporating Vocals and Accompaniment
Zhiqing Hong
Rongjie Huang
Xize Cheng
Yongqi Wang
Ruiqi Li
Fuming You
Zhou Zhao
Zhimeng Zhang
54
10
0
14 Apr 2024
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing
Yin-Yin He
Zeyue Tian
Xintao Wang
Qifeng Chen
78
57
0
27 Feb 2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
DiffM
62
35
0
02 Jan 2024
SyncFusion: Multimodal Onset-synchronized Video-to-Audio Foley Synthesis
Marco Comunità
R. F. Gramaccioni
Emilian Postolache
Emanuele Rodolà
Danilo Comminiello
Joshua D. Reiss
DiffM
54
17
0
23 Oct 2023
AutoAgents: A Framework for Automatic Agent Generation
Guangyao Chen
Siwei Dong
Yu Shu
Ge Zhang
Jaward Sesay
Börje F. Karlsson
Jie Fu
Yemin Shi
LLMAG
82
121
0
29 Sep 2023
Retrieval-Augmented Text-to-Audio Generation
Yiitan Yuan
Haohe Liu
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
RALM
53
28
0
14 Sep 2023
Music Understanding LLaMA: Advancing Text-to-Music Generation with Question Answering and Captioning
Shansong Liu
Atin Sakkeer Hussain
Chenshuo Sun
Yin Shan
MLLM
49
51
0
22 Aug 2023
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Kai Chen
Yusong Wu
Haohe Liu
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
DiffM
69
81
0
03 Aug 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
202
340
0
30 Jan 2023
Video Background Music Generation: Dataset, Method and Evaluation
Le Zhuo
Zhaokai Wang
Baisen Wang
Yue Liao
Chenxi Bao
Stanley Peng
Miao Lu
Xiaobo Li
Fei Fang
Si Liu
VGen
48
30
0
21 Nov 2022
Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer
Noam M. Shazeer
Azalia Mirhoseini
Krzysztof Maziarz
Andy Davis
Quoc V. Le
Geoffrey E. Hinton
J. Dean
MoE
248
2,644
0
23 Jan 2017
1