ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12503
  4. Cited By
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

AudioLDM: Text-to-Audio Generation with Latent Diffusion Models

29 January 2023
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo Mandic
Wenwu Wang
Mark D. Plumbley
    DiffM
ArXivPDFHTML

Papers citing "AudioLDM: Text-to-Audio Generation with Latent Diffusion Models"

43 / 93 papers shown
Title
Generative Diffusion Models for Fast Simulations of Particle Collisions
  at CERN
Generative Diffusion Models for Fast Simulations of Particle Collisions at CERN
Mikołaj Kita
Jan Dubiñski
Przemysław Rokita
Kamil Deja
DiffM
AI4CE
43
2
0
05 Jun 2024
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
SysCaps: Language Interfaces for Simulation Surrogates of Complex Systems
Patrick Emami
Zhaonan Li
Saumya Sinha
Truc Nguyen
56
1
0
30 May 2024
X-VILA: Cross-Modality Alignment for Large Language Model
X-VILA: Cross-Modality Alignment for Large Language Model
Hanrong Ye
De-An Huang
Yao Lu
Zhiding Yu
Ming-Yu Liu
...
Jan Kautz
Song Han
Dan Xu
Pavlo Molchanov
Hongxu Yin
MLLM
VLM
45
30
0
29 May 2024
AdjointDEIS: Efficient Gradients for Diffusion Models
AdjointDEIS: Efficient Gradients for Diffusion Models
Zander W. Blasingame
Chen Liu
DiffM
51
2
0
23 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
50
9
0
20 May 2024
Prompt-guided Precise Audio Editing with Diffusion Models
Prompt-guided Precise Audio Editing with Diffusion Models
Manjie Xu
Chenxing Li
Duzhen Zhang
Dan Su
Weihan Liang
Dong Yu
DiffM
36
4
0
11 May 2024
TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image
  Generation with Diffusion Models
TwinDiffusion: Enhancing Coherence and Efficiency in Panoramic Image Generation with Diffusion Models
Teng Zhou
Yongchuan Tang
DiffM
48
2
0
30 Apr 2024
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued
  Speech Gesture Generation with Diffusion Model
Bridge to Non-Barrier Communication: Gloss-Prompted Fine-grained Cued Speech Gesture Generation with Diffusion Model
Wen-Ling Lei
Li Liu
Jun Wang
DiffM
35
2
0
30 Apr 2024
Synthetic training set generation using text-to-audio models for
  environmental sound classification
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
37
2
0
26 Mar 2024
Correlation of Fréchet Audio Distance With Human Perception of
  Environmental Audio Is Embedding Dependant
Correlation of Fréchet Audio Distance With Human Perception of Environmental Audio Is Embedding Dependant
Modan Tailleur
Junwon Lee
Mathieu Lagrange
Keunwoo Choi
Laurie M. Heller
Keisuke Imoto
Yuki Okamoto
30
10
0
26 Mar 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
56
17
0
28 Feb 2024
LLMBind: A Unified Modality-Task Integration Framework
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
40
6
0
22 Feb 2024
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Sifei Li
Yuxin Zhang
Fan Tang
Chongyang Ma
Weiming Dong
Changsheng Xu
DiffM
40
11
0
21 Feb 2024
SonicVisionLM: Playing Sound with Vision Language Models
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
28
0
15 Dec 2023
Investigating the Design Space of Diffusion Models for Speech
  Enhancement
Investigating the Design Space of Diffusion Models for Speech Enhancement
Philippe Gonzalez
Zheng-Hua Tan
Jan Østergaard
Jesper Jensen
T. S. Alstrøm
Tobias May
DiffM
30
6
0
07 Dec 2023
FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models
FitDiff: Robust monocular 3D facial shape and reflectance estimation using Diffusion Models
Stathis Galanakis
Alexandros Lattas
Stylianos Moschoglou
S. Zafeiriou
33
2
0
07 Dec 2023
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
Sven Hollowell
Tashi Namgyal
Paul Marshall
27
0
0
06 Dec 2023
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech
  Gesture Generation
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi
Jiahao Pan
Peng Li
Ruibin Yuan
Xiaowei Chi
...
Wenhan Luo
Wei Xue
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
SLR
34
11
0
29 Nov 2023
Content-based Controls For Music Large Language Modeling
Content-based Controls For Music Large Language Modeling
Liwei Lin
Gus Xia
Junyan Jiang
Yixiao Zhang
18
14
0
26 Oct 2023
Matryoshka Diffusion Models
Matryoshka Diffusion Models
Jiatao Gu
Shuangfei Zhai
Yizhen Zhang
Joshua M. Susskind
Navdeep Jaitly
DiffM
21
43
0
23 Oct 2023
Audio Editing with Non-Rigid Text Prompts
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Luca Della Libera
Zhepei Wang
Mirco Ravanelli
Paris Smaragdis
Cem Subakan
DiffM
46
5
0
19 Oct 2023
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language
  Models
CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models
Sreyan Ghosh
Ashish Seth
Sonal Kumar
Utkarsh Tyagi
Chandra Kiran Reddy Evuru
S. Ramaneswaran
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
VLM
CoGe
40
23
0
12 Oct 2023
Investigating Personalization Methods in Text to Music Generation
Investigating Personalization Methods in Text to Music Generation
Manos Plitsis
Theodoros Kouzelis
Georgios Paraskevopoulos
V. Katsouros
Yannis Panagakis
DiffM
32
10
0
20 Sep 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
36
224
0
10 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
72
37
0
09 Aug 2023
Squeezing Large-Scale Diffusion Models for Mobile
Squeezing Large-Scale Diffusion Models for Mobile
Jiwoong Choi
Minkyu Kim
Daehyun Ahn
Taesu Kim
Yulhwa Kim
Do-Hyun Jo
H. Jeon
Jae-Joon Kim
Hyungjun Kim
31
9
0
03 Jul 2023
Solving Linear Inverse Problems Provably via Posterior Sampling with
  Latent Diffusion Models
Solving Linear Inverse Problems Provably via Posterior Sampling with Latent Diffusion Models
Litu Rout
Negin Raoof
Giannis Daras
C. Caramanis
A. Dimakis
Sanjay Shakkottai
DiffM
38
93
0
02 Jul 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
41
10
0
22 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
20
17
0
22 May 2023
Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot
  Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
Description and Discussion on DCASE 2023 Challenge Task 2: First-Shot Unsupervised Anomalous Sound Detection for Machine Condition Monitoring
Kota Dohi
Keisuke Imoto
Noboru Harada
Daisuke Niizumi
Yuma Koizumi
Tomoya Nishida
Harsh Purohit
Ryo Tanabe
Takashi Endo
Y. Kawaguchi
11
37
0
13 May 2023
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency
  Model
CoMoSpeech: One-Step Speech and Singing Voice Synthesis via Consistency Model
Zhe Ye
Wei Xue
Xuejiao Tan
Jie Chen
Qi-fei Liu
Yi-Ting Guo
DiffM
30
40
0
11 May 2023
Your Diffusion Model is Secretly a Zero-Shot Classifier
Your Diffusion Model is Secretly a Zero-Shot Classifier
Alexander C. Li
Mihir Prabhudesai
Shivam Duggal
Ellis L Brown
Deepak Pathak
DiffM
VLM
55
226
0
28 Mar 2023
DLT: Conditioned layout generation with Joint Discrete-Continuous
  Diffusion Layout Transformer
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
Elad Levi
Eli Brosh
Mykola Mykhailych
Meir Perez
DiffM
58
16
0
07 Mar 2023
Learning Temporal Resolution in Spectrogram for Audio Classification
Learning Temporal Resolution in Spectrogram for Audio Classification
Haohe Liu
Xubo Liu
Qiuqiang Kong
Wenwu Wang
Mark D. Plumbley
34
7
0
04 Oct 2022
Simple Pooling Front-ends For Efficient Audio Classification
Simple Pooling Front-ends For Efficient Audio Classification
Xubo Liu
Haohe Liu
Qiuqiang Kong
Xinhao Mei
Mark D. Plumbley
Wenwu Wang
46
16
0
03 Oct 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
  Binaural Audio Synthesis
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
Yichong Leng
Zehua Chen
Junliang Guo
Haohe Liu
Jiawei Chen
...
Lei He
Xiang-Yang Li
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
53
58
0
30 May 2022
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound
  Classification and Detection
HTS-AT: A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection
Ke Chen
Xingjian Du
Bilei Zhu
Zejun Ma
Taylor Berg-Kirkpatrick
Shlomo Dubnov
ViT
127
264
0
02 Feb 2022
Audio-to-Image Cross-Modal Generation
Audio-to-Image Cross-Modal Generation
Maciej Żelaszczyk
Jacek Mańdziuk
DiffM
53
15
0
27 Sep 2021
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music
  Source Separation
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation
Qiuqiang Kong
Yin Cao
Haohe Liu
Keunwoo Choi
Yuxuan Wang
118
96
0
12 Sep 2021
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and
  Aggregation
PSLA: Improving Audio Tagging with Pretraining, Sampling, Labeling, and Aggregation
Yuan Gong
Yu-An Chung
James R. Glass
VLM
104
144
0
02 Feb 2021
DDSP: Differentiable Digital Signal Processing
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
94
373
0
14 Jan 2020
Image-to-Image Translation with Conditional Adversarial Networks
Image-to-Image Translation with Conditional Adversarial Networks
Phillip Isola
Jun-Yan Zhu
Tinghui Zhou
Alexei A. Efros
SSeg
212
19,455
0
21 Nov 2016
Previous
12