Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.12503
Cited By
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
29 January 2023
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo P. Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AudioLDM: Text-to-Audio Generation with Latent Diffusion Models"
50 / 91 papers shown
Title
Diffused Responsibility: Analyzing the Energy Consumption of Generative Text-to-Audio Diffusion Models
Riccardo Passoni
Francesca Ronchini
Luca Comanducci
Romain Serizel
Fabio Antonacci
DiffM
38
0
0
12 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIP
AI4TS
VLM
36
0
0
12 May 2025
FLAM: Frame-Wise Language-Audio Modeling
Yusong Wu
Christos Tsirigotis
Ke Chen
Cheng-Zhi Anna Huang
Aaron C. Courville
Oriol Nieto
Prem Seetharaman
Justin Salamon
50
0
0
08 May 2025
SonicRAG : High Fidelity Sound Effects Synthesis Based on Retrival Augmented Generation
Yu-Ren Guo
Wen-Kai Tai
57
0
0
06 May 2025
CoCoDiff: Diversifying Skeleton Action Features via Coarse-Fine Text-Co-Guided Latent Diffusion
Zhifu Zhao
Hanyang Hua
J. Li
Shaoxin Wu
Fu Li
Yangtao Zhou
Yang Li
DiffM
68
0
0
30 Apr 2025
Sparse-to-Sparse Training of Diffusion Models
Inês Cardoso Oliveira
Decebal Constantin Mocanu
Luis A. Leiva
DiffM
86
0
0
30 Apr 2025
TrueFake: A Real World Case Dataset of Last Generation Fake Images also Shared on Social Networks
S. Dell’Anna
Andrea Montibeller
Giulia Boato
62
0
0
29 Apr 2025
Temporal Attention Pooling for Frequency Dynamic Convolution in Sound Event Detection
Hyeonuk Nam
Yong-Hwa Park
33
0
0
17 Apr 2025
SlimSpeech: Lightweight and Efficient Text-to-Speech with Slim Rectified Flow
Kaidi Wang
Wenhao Guan
Shenghui Lu
Jianglong Yao
Lin Li
Q. Hong
32
0
0
10 Apr 2025
LoopGen: Training-Free Loopable Music Generation
Davide Marincione
Giorgio Strano
Donato Crisostomi
Roberto Ribuoli
Emanuele Rodolà
MGen
60
0
0
06 Apr 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
50
0
0
15 Mar 2025
DexGrasp Anything: Towards Universal Robotic Dexterous Grasping with Physics Awareness
Yiming Zhong
Qi Jiang
Jingyi Yu
Yuexin Ma
58
2
0
11 Mar 2025
DGFM: Full Body Dance Generation Driven by Music Foundation Models
Xinran Liu
Zhenhua Feng
Diptesh Kanojia
Wenwu Wang
DiffM
66
1
0
27 Feb 2025
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
Ching Hua Lee
Chouchang Yang
Jaejin Cho
Yashas Malur Saidutta
R. S. Srinivasa
Yilin Shen
Hongxia Jin
DiffM
85
0
0
19 Feb 2025
TokenSynth: A Token-based Neural Synthesizer for Instrument Cloning and Text-to-Instrument
Kyungsu Kim
Junghyun Koo
Sungho Lee
Haesun Joung
Kyogu Lee
58
0
0
13 Feb 2025
A Reversible Solver for Diffusion SDEs
Zander Blasingame
Chen Liu
DiffM
54
0
0
12 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
125
2
0
07 Feb 2025
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Siyuan Hou
Shansong Liu
Ruibin Yuan
Wei Xue
Ying Shan
Mangsuo Zhao
Chao Zhang
87
3
0
17 Jan 2025
COCOLA: Coherence-Oriented Contrastive Learning of Musical Audio Representations
Ruben Ciranni
Emilian Postolache
Giorgio Mariani
Michele Mancusi
Giorgio Fabbro
Emanuele Rodolà
Luca Cosmo
74
7
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
FlowSep: Language-Queried Sound Separation with Rectified Flow Matching
Yi Yuan
Xubo Liu
Haohe Liu
Mark D. Plumbley
Wenwu Wang
52
3
0
10 Jan 2025
Generative AI for Cel-Animation: A Survey
Yunlong Tang
Junjia Guo
Pinxin Liu
Zhiyuan Wang
Hang Hua
...
Jing Bi
Mingqian Feng
Xuzhao Li
Zeliang Zhang
Chenliang Xu
VGen
88
7
0
08 Jan 2025
Rare-to-Frequent: Unlocking Compositional Generation Power of Diffusion Models on Rare Concepts with LLM Guidance
Dongmin Park
Sebin Kim
Taehong Moon
Minkyu Kim
Kangwook Lee
Jaewoong Cho
DiffM
CoGe
64
2
0
08 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
Ran Xu
Hairu Wang
Caiming Xiong
Shri Kiran Srinivasan
DiffM
85
0
0
03 Jan 2025
Simultaneous Music Separation and Generation Using Multi-Track Latent Diffusion Models
Tornike Karchkhadze
M. Izadi
Shlomo Dubnov
DiffM
44
2
0
31 Dec 2024
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Chia-Yu Hung
Navonil Majumder
Zhifeng Kong
Ambuj Mehrish
Rafael Valle
Bryan Catanzaro
Soujanya Poria
Bryan Catanzaro
Soujanya Poria
52
5
0
30 Dec 2024
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
66
2
0
14 Nov 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
62
0
0
14 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
Zachary Novack
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
45
5
0
07 Oct 2024
MDSGen: Fast and Efficient Masked Diffusion Temporal-Aware Transformers for Open-Domain Sound Generation
T. Pham
Tri Ton
Chang D. Yoo
41
3
0
03 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
49
2
0
02 Oct 2024
A Simple but Strong Baseline for Sounding Video Generation: Effective Adaptation of Audio and Video Diffusion Models for Joint Generation
Masato Ishii
Akio Hayakawa
Takashi Shibuya
Yuki Mitsufuji
VGen
DiffM
65
4
0
26 Sep 2024
GALD-SE: Guided Anisotropic Lightweight Diffusion for Efficient Speech Enhancement
Chengzhong Wang
Jianjun Gu
Dingding Yao
Junfeng Li
Yonghong Yan
DiffM
131
0
0
23 Sep 2024
AudioEditor: A Training-Free Diffusion-Based Audio Editing Framework
Yuhang Jia
Yang Chen
Jinghua Zhao
Shiwan Zhao
Wenjia Zeng
Yong Chen
Yong Qin
DiffM
36
1
0
19 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Yishuo Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
45
2
0
19 Sep 2024
High-Resolution Speech Restoration with Latent Diffusion Model
Tushar Dhyani
Florian Lux
Michele Mancusi
Giorgio Fabbro
Fritz Hohl
Ngoc Thang Vu
DiffM
37
0
0
17 Sep 2024
Language-Queried Target Sound Extraction Without Parallel Training Data
Hao Ma
Zhiyuan Peng
Xu Li
Yukai Li
Mingjie Shao
Qiuqiang Kong
Ju Liu
VLM
77
1
0
14 Sep 2024
Sub-graph Based Diffusion Model for Link Prediction
Hang Li
Wei Jin
Geri Skenderi
Harry Shomer
Wenzhuo Tang
Wenqi Fan
Jiliang Tang
DiffM
33
0
0
13 Sep 2024
STA-V2A: Video-to-Audio Generation with Semantic and Temporal Alignment
Yong Ren
Chenxing Li
Manjie Xu
Wei Liang
Yu Gu
Rilin Chen
Dong Yu
VGen
DiffM
48
7
0
13 Sep 2024
Bridging Paintings and Music -- Exploring Emotion based Music Generation through Paintings
Tanisha Hisariya
Huan Zhang
Jinhua Liang
29
3
0
12 Sep 2024
InstructSing: High-Fidelity Singing Voice Generation via Instructing Yourself
Chang Zeng
Chunhui Wang
Xiaoxiao Miao
Jian Zhao
Zhonglin Jiang
Yong Chen
41
0
0
10 Sep 2024
Atlas Gaussians Diffusion for 3D Generation
Haitao Yang
Yuan Dong
Hanwen Jiang
Dejia Xu
Georgios Pavlakos
Qixing Huang
3DGS
81
3
0
23 Aug 2024
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
43
11
0
10 Jul 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
31
12
0
08 Jul 2024
PAGURI: a user experience study of creative interaction with text-to-music models
Francesca Ronchini
Luca Comanducci
Gabriele Perego
Fabio Antonacci
35
3
0
05 Jul 2024
TimeLDM: Latent Diffusion Model for Unconditional Time Series Generation
Jian Qian
Miao Sun
Sifan Zhou
Biao Wan
Minhao Li
Patrick Chiang
39
7
0
05 Jul 2024
Subtractive Training for Music Stem Insertion using Latent Diffusion Models
Ivan Villa-Renteria
Mason L. Wang
Zachary Shah
Zhe Li
Soohyun Kim
Neelesh Ramachandran
Mert Pilanci
42
0
0
27 Jun 2024
MusicScore: A Dataset for Music Score Modeling and Generation
Yuheng Lin
Zheqi Dai
Qiuqiang Kong
VLM
37
2
0
17 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
Kaidi Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
36
3
0
12 Jun 2024
FakeSound: Deepfake General Audio Detection
Zeyu Xie
Baihan Li
Xuenan Xu
Zheng Liang
Kai Yu
Mengyue Wu
33
2
0
12 Jun 2024
1
2
Next