ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.05734
  4. Cited By
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining

AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining

10 August 2023
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
    DiffM
ArXivPDFHTML

Papers citing "AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining"

50 / 168 papers shown
Title
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion
  Models
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models
J. Nistal
Marco Pasini
Cyran Aouameur
M. Grachten
Stefan Lattner
DiffM
53
16
0
12 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
Kaidi Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
36
3
0
12 Jun 2024
FakeSound: Deepfake General Audio Detection
FakeSound: Deepfake General Audio Detection
Zeyu Xie
Baihan Li
Xuenan Xu
Zheng Liang
Kai Yu
Mengyue Wu
33
2
0
12 Jun 2024
Scaling up masked audio encoder learning for general audio
  classification
Scaling up masked audio encoder learning for general audio classification
Heinrich Dinkel
Zhiyong Yan
Yongqing Wang
Junbo Zhang
Yujun Wang
Bin Wang
50
3
0
11 Jun 2024
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice
  Conversion with Singer Guidance
LDM-SVC: Latent Diffusion Model Based Zero-Shot Any-to-Any Singing Voice Conversion with Singer Guidance
Shihao Chen
Yu Gu
Jie Zhang
Na Li
Rilin Chen
Liping Chen
Lirong Dai
DiffM
42
6
0
08 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
46
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
39
3
0
06 Jun 2024
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Lumina-Next: Making Lumina-T2X Stronger and Faster with Next-DiT
Le Zhuo
Ruoyi Du
Han Xiao
Yangguang Li
Dongyang Liu
...
Wanli Ouyang
Ziwei Liu
Ping Luo
Hongsheng Li
Peng Gao
52
45
0
05 Jun 2024
An Independence-promoting Loss for Music Generation with Language Models
An Independence-promoting Loss for Music Generation with Language Models
Jean-Marie Lemercier
Simon Rouard
Jade Copet
Yossi Adi
Alexandre Défossez
30
1
0
04 Jun 2024
MidiCaps: A large-scale MIDI dataset with text captions
MidiCaps: A large-scale MIDI dataset with text captions
J. Melechovský
Abhinaba Roy
Dorien Herremans
32
10
0
04 Jun 2024
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
AudioLCM: Text-to-Audio Generation with Latent Consistency Models
Huadai Liu
Rongjie Huang
Yang Liu
Hengyuan Cao
Jialei Wang
Xize Cheng
Siqi Zheng
Zhou Zhao
70
8
0
01 Jun 2024
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Frieren: Efficient Video-to-Audio Generation Network with Rectified Flow Matching
Yongqi Wang
Wenxiang Guo
Rongjie Huang
Jia-Bin Huang
Zehan Wang
Fuming You
Ruiqi Li
Zhou Zhao
VGen
DiffM
31
12
0
01 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
56
0
0
31 May 2024
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music
  Generation
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
30
8
0
30 May 2024
Reverse the auditory processing pathway: Coarse-to-fine audio
  reconstruction from fMRI
Reverse the auditory processing pathway: Coarse-to-fine audio reconstruction from fMRI
Che Liu
Changde Du
Xiaoyu Chen
Huiguang He
36
2
0
29 May 2024
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language
  Models via Instruction Tuning
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Yixiao Zhang
Yukara Ikemiya
Woosung Choi
Naoki Murata
Marco A. Martínez-Ramírez
Liwei Lin
Gus Xia
Wei-Hsiang Liao
Yuki Mitsufuji
Simon Dixon
57
10
0
28 May 2024
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
41
1
0
24 May 2024
Images that Sound: Composing Images and Sounds on a Single Canvas
Images that Sound: Composing Images and Sounds on a Single Canvas
Ziyang Chen
Daniel Geng
Andrew Owens
DiffM
50
9
0
20 May 2024
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
Naturalistic Music Decoding from EEG Data via Latent Diffusion Models
Emilian Postolache
Natalia Polouliakh
Hiroaki Kitano
Akima Connelly
Emanuele Rodolà
Luca Cosmo
Taketo Akama
MedIm
DiffM
35
2
0
15 May 2024
Prompt-guided Precise Audio Editing with Diffusion Models
Prompt-guided Precise Audio Editing with Diffusion Models
Manjie Xu
Chenxing Li
Duzhen Zhang
Dan Su
Weihan Liang
Dong Yu
DiffM
36
4
0
11 May 2024
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework
  Based on Pre-Trained Large Models
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Tianze Xu
Jiajun Li
Xuesong Chen
Xinrui Yao
Shuchang Liu
32
4
0
05 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
35
18
0
30 Apr 2024
Music Consistency Models
Music Consistency Models
Zhengcong Fei
Mingyuan Fan
Junshi Huang
DiffM
53
5
0
20 Apr 2024
Long-form music generation with latent diffusion
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGen
DiffM
44
39
0
16 Apr 2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
  Direct Preference Optimization
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Navonil Majumder
Chia-Yu Hung
Deepanway Ghosal
Wei-Ning Hsu
Rada Mihalcea
Soujanya Poria
47
52
0
15 Apr 2024
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving
  Zero-Shot Voice Editing
VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing
Philip Anastassiou
Zhenyu Tang
Kainan Peng
Dongya Jia
Jiaxin Li
Ming Tu
Yuping Wang
Yuxuan Wang
Mingbo Ma
42
4
0
10 Apr 2024
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
Junghyun Koo
G. Wichern
François Germain
Sameer Khurana
Jonathan Le Roux
34
3
0
02 Apr 2024
Synthetic training set generation using text-to-audio models for
  environmental sound classification
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
37
2
0
26 Mar 2024
Generalized Multi-Source Inference for Text Conditioned Music Diffusion
  Models
Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Emilian Postolache
Giorgio Mariani
Luca Cosmo
Emmanouil Benetos
Emanuele Rodolà
DiffM
43
9
0
18 Mar 2024
Training Machine Learning models at the Edge: A Survey
Training Machine Learning models at the Edge: A Survey
Aymen Rayane Khouas
Mohamed Reda Bouadjenek
Hakim Hacid
Sunil Aryal
29
10
0
05 Mar 2024
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion
  Latent Aligners
Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners
Yazhou Xing
Yin-Yin He
Zeyue Tian
Xintao Wang
Qifeng Chen
35
52
0
27 Feb 2024
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Zero-Shot Unsupervised and Text-Based Audio Editing Using DDPM Inversion
Hila Manor
T. Michaeli
DiffM
29
25
0
15 Feb 2024
Can Text-to-image Model Assist Multi-modal Learning for Visual
  Recognition with Visual Modality Missing?
Can Text-to-image Model Assist Multi-modal Learning for Visual Recognition with Visual Modality Missing?
Tiantian Feng
Daniel Yang
Digbalay Bose
Shrikanth Narayanan
40
5
0
14 Feb 2024
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models
Yixiao Zhang
Yukara Ikemiya
Gus Xia
Naoki Murata
Marco A. Martínez-Ramírez
Wei-Hsiang Liao
Yuki Mitsufuji
Simon Dixon
47
20
0
09 Feb 2024
Fast Timing-Conditioned Latent Audio Diffusion
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
82
102
0
07 Feb 2024
Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced
  Auditory Experience
Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience
Xilin Jiang
Cong Han
Yinghao Aaron Li
N. Mesgarani
KELM
34
5
0
06 Feb 2024
PAM: Prompting Audio-Language Models for Audio Quality Assessment
PAM: Prompting Audio-Language Models for Audio Quality Assessment
Soham Deshmukh
Dareen Alharthi
Benjamin Elizalde
Hannes Gamper
Mahmoud Al Ismail
Rita Singh
Bhiksha Raj
Huaming Wang
29
11
0
01 Feb 2024
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
DiffM
34
33
0
22 Jan 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
54
36
0
09 Jan 2024
Auffusion: Leveraging the Power of Diffusion and Large Language Models
  for Text-to-Audio Generation
Auffusion: Leveraging the Power of Diffusion and Large Language Models for Text-to-Audio Generation
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
DiffM
23
29
0
02 Jan 2024
Audiobox: Unified Audio Generation with Natural Language Prompts
Audiobox: Unified Audio Generation with Natural Language Prompts
Apoorv Vyas
Bowen Shi
Matt Le
Andros Tjandra
Yi-Chiao Wu
...
Chris Summers
Carleigh Wood
Joshua Lane
Mary Williamson
Wei-Ning Hsu
55
76
0
25 Dec 2023
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
28
0
15 Dec 2023
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction
  Following
InstructAny2Pix: Flexible Visual Editing via Multimodal Instruction Following
Shufan Li
Harkanwar Singh
Aditya Grover
DiffM
22
7
0
11 Dec 2023
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional
  Modeling
CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling
Ruihan Yang
H. Gamper
Sebastian Braun
DiffM
32
5
0
08 Dec 2023
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Schrodinger Bridges Beat Diffusion Models on Text-to-Speech Synthesis
Zehua Chen
Guande He
Kaiwen Zheng
Xu Tan
Jun Zhu
DiffM
56
21
0
06 Dec 2023
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
CoDi-2: In-Context, Interleaved, and Interactive Any-to-Any Generation
Zineng Tang
Ziyi Yang
Mahmoud Khademi
Yang Liu
Chenguang Zhu
Mohit Bansal
LRM
MLLM
AuLLM
54
45
0
30 Nov 2023
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech
  Gesture Generation
Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Xingqun Qi
Jiahao Pan
Peng Li
Ruibin Yuan
Xiaowei Chi
...
Wenhan Luo
Wei Xue
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
SLR
34
11
0
29 Nov 2023
Musical Form Generation
Musical Form Generation
Lilac Atassi
13
0
0
30 Oct 2023
Audio Editing with Non-Rigid Text Prompts
Audio Editing with Non-Rigid Text Prompts
Francesco Paissan
Luca Della Libera
Zhepei Wang
Mirco Ravanelli
Paris Smaragdis
Cem Subakan
DiffM
46
5
0
19 Oct 2023
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative
  Editing
Loop Copilot: Conducting AI Ensembles for Music Generation and Iterative Editing
Yixiao Zhang
Akira Maezawa
Gus Xia
Kazuhiko Yamamoto
Simon Dixon
49
17
0
19 Oct 2023
Previous
1234
Next