ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.05734
  4. Cited By
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

10 August 2023
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
    DiffM
ArXivPDFHTML

Papers citing "AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining"

18 / 168 papers shown
Title
Extending Multi-modal Contrastive Representations
Extending Multi-modal Contrastive Representations
Zehan Wang
Ziang Zhang
Luping Liu
Yang Zhao
Haifeng Huang
Tao Jin
Zhou Zhao
26
5
0
13 Oct 2023
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker
  Extraction
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker Extraction
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
32
10
0
11 Oct 2023
uSee: Unified Speech Enhancement and Editing with Conditional Diffusion
  Models
uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models
Muqiao Yang
Chunlei Zhang
Yong-mei Xu
Zhongweiyang Xu
Heming Wang
Bhiksha Raj
Dong Yu
DiffM
31
4
0
02 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
28
115
0
01 Oct 2023
VoiceLDM: Text-to-Speech with Environmental Context
VoiceLDM: Text-to-Speech with Environmental Context
Yeong-Won Lee
In-won Yeon
Juhan Nam
Joon Son Chung
VLM
DiffM
22
10
0
24 Sep 2023
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation
  with Consistency Distillation
ConsistencyTTA: Accelerating Diffusion-Based Text-to-Audio Generation with Consistency Distillation
Yatong Bai
Trung D. Q. Dang
Dung N. Tran
K. Koishida
Somayeh Sojoudi
DiffM
52
22
0
19 Sep 2023
FoleyGen: Visually-Guided Audio Generation
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
21
21
0
19 Sep 2023
Enhance audio generation controllability through representation
  similarity regularization
Enhance audio generation controllability through representation similarity regularization
Yangyang Shi
Gaël Le Lan
Varun K. Nagaraja
Zhaoheng Ni
Xinhao Mei
Ernie Chang
Forrest N. Iandola
Yang Liu
Vikas Chandra
42
1
0
15 Sep 2023
AudioSR: Versatile Audio Super-resolution at Scale
AudioSR: Versatile Audio Super-resolution at Scale
Haohe Liu
Ke Chen
Qiao Tian
Wenwu Wang
Mark D. Plumbley
DiffM
18
21
0
13 Sep 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MA
AuLLM
33
38
0
24 Aug 2023
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by
  Connecting Foundation Models
V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models
Heng Wang
Jianbo Ma
Santiago Pascual
Richard Cartwright
Weidong (Tom) Cai
VGen
21
39
0
18 Aug 2023
Text-to-Audio Generation using Instruction-Tuned LLM and Latent
  Diffusion Model
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
152
144
0
24 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
46
194
0
30 Mar 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
151
317
0
30 Jan 2023
Simple Pooling Front-ends For Efficient Audio Classification
Simple Pooling Front-ends For Efficient Audio Classification
Xubo Liu
Haohe Liu
Qiuqiang Kong
Xinhao Mei
Mark D. Plumbley
Wenwu Wang
46
16
0
03 Oct 2022
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for
  Binaural Audio Synthesis
BinauralGrad: A Two-Stage Conditional Diffusion Probabilistic Model for Binaural Audio Synthesis
Yichong Leng
Zehua Chen
Junliang Guo
Haohe Liu
Jiawei Chen
...
Lei He
Xiang-Yang Li
Tao Qin
Sheng Zhao
Tie-Yan Liu
DiffM
53
58
0
30 May 2022
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music
  Source Separation
Decoupling Magnitude and Phase Estimation with Deep ResUNet for Music Source Separation
Qiuqiang Kong
Yin Cao
Haohe Liu
Keunwoo Choi
Yuxuan Wang
118
96
0
12 Sep 2021
DDSP: Differentiable Digital Signal Processing
DDSP: Differentiable Digital Signal Processing
Jesse Engel
Lamtharn Hantrakul
Chenjie Gu
Adam Roberts
DiffM
94
373
0
14 Jan 2020
Previous
1234