Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2207.09983
Cited By
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
20 July 2022
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Diffsound: Discrete Diffusion Model for Text-to-sound Generation"
50 / 59 papers shown
Title
Computationally Efficient Diffusion Models in Medical Imaging: A Comprehensive Review
Abdullah
Tao Huang
Ickjai Lee
Euijoon Ahn
MedIm
26
0
0
09 May 2025
Denoising Diffusion Probabilistic Models for Coastal Inundation Forecasting
Kazi Ashik Islam
Zakaria Mehrab
Mahantesh Halappanavar
H. Mortveit
Sridhar Katragadda
Jon Derek Loftis
Madhav V. Marathe
DiffM
AI4CE
42
0
0
08 May 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
HiPART: Hierarchical Pose AutoRegressive Transformer for Occluded 3D Human Pose Estimation
Hongwei Zheng
Han Li
Wenrui Dai
Ziyang Zheng
Chenglin Li
Junni Zou
Hongkai Xiong
3DH
60
0
0
30 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
52
0
0
15 Mar 2025
Bayesian Computation in Deep Learning
Wenlong Chen
Bolian Li
Ruqi Zhang
Yingzhen Li
BDL
75
0
0
25 Feb 2025
Do we really have to filter out random noise in pre-training data for language models?
Jinghan Ru
Yuxin Xie
Xianwei Zhuang
Yuguo Yin
Zhihui Guo
Zhiming Liu
Qianli Ren
Yuexian Zou
83
2
0
10 Feb 2025
Simplified and Generalized Masked Diffusion for Discrete Data
Jiaxin Shi
Kehang Han
Zehao Wang
Arnaud Doucet
Michalis K. Titsias
DiffM
85
62
0
17 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Text2Data: Low-Resource Data Generation with Textual Control
Shiyu Wang
Yihao Feng
Tian Lan
Ning Yu
Yu Bai
Ran Xu
David W. Romero
Caiming Xiong
Siyang Song
DiffM
85
0
0
03 Jan 2025
Spider: Any-to-Many Multimodal LLM
Jinxiang Lai
Jie Zhang
Jun Liu
Jian Li
Xiaocheng Lu
Song Guo
MLLM
69
2
0
14 Nov 2024
Multi-Scale Diffusion: Enhancing Spatial Layout in High-Resolution Panoramic Image Generation
Xiaoyu Zhang
Teng Zhou
Xinlong Zhang
Jia Wei
Yongchuan Tang
44
1
0
24 Oct 2024
BrainECHO: Semantic Brain Signal Decoding through Vector-Quantized Spectrogram Reconstruction for Whisper-Enhanced Text Generation
Juntao Li
Zhenxi Song
Jiaqi Wang
Min Zhang
Honghai Liu
Min Zhang
Zhiguo Zhang
31
1
0
19 Oct 2024
How Discrete and Continuous Diffusion Meet: Comprehensive Analysis of Discrete Diffusion Models via a Stochastic Integral Framework
Yinuo Ren
Haoxuan Chen
Grant M. Rotskoff
Lexing Ying
47
3
0
04 Oct 2024
DNI: Dilutional Noise Initialization for Diffusion Video Editing
Sunjae Yoon
Gwanhyeong Koo
Ji Woo Hong
Chang D. Yoo
DiffM
43
2
0
19 Sep 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Yuping Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
45
2
0
19 Sep 2024
MambaFoley: Foley Sound Generation using Selective State-Space Models
Marco Furio Colombo
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
Mamba
25
1
0
13 Sep 2024
Read, Watch and Scream! Sound Generation from Text and Video
Yujin Jeong
Yunji Kim
Sanghyuk Chun
Jiyoung Lee
VGen
DiffM
31
12
0
08 Jul 2024
PAGURI: a user experience study of creative interaction with text-to-music models
Francesca Ronchini
Luca Comanducci
Gabriele Perego
Fabio Antonacci
35
3
0
05 Jul 2024
EmoAttack: Emotion-to-Image Diffusion Models for Emotional Backdoor Generation
Tianyu Wei
Shanmin Pang
Qi Guo
Yizhuo Ma
Yihao Huang
Ming-Ming Cheng
Qing Guo
132
2
0
22 Jun 2024
LAFMA: A Latent Flow Matching Model for Text-to-Audio Generation
Wenhao Guan
Kaixin Wang
Wangjin Zhou
Yang Wang
Feng Deng
Hui Wang
Lin Li
Q. Hong
Yong Qin
DiffM
36
3
0
12 Jun 2024
FakeSound: Deepfake General Audio Detection
Zeyu Xie
Baihan Li
Xuenan Xu
Zheng Liang
Kai Yu
Mengyue Wu
33
2
0
12 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Y. Guo
VGen
102
16
0
06 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
38
3
0
05 Jun 2024
AudioSetMix: Enhancing Audio-Language Datasets with LLM-Assisted Augmentations
David Xu
23
2
0
17 May 2024
Prompt-guided Precise Audio Editing with Diffusion Models
Manjie Xu
Chenxing Li
Duzhen Zhang
Dan Su
Weihan Liang
Dong Yu
DiffM
36
4
0
11 May 2024
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
37
2
0
26 Mar 2024
Denoising Task Difficulty-based Curriculum for Training Diffusion Models
Jin-Young Kim
Hyojun Go
Soonwoo Kwon
Hyun-Gyoon Kim
DiffM
56
6
0
15 Mar 2024
LLMBind: A Unified Modality-Task Integration Framework
Bin Zhu
Munan Ning
Peng Jin
Bin Lin
Jinfa Huang
...
Junwu Zhang
Zhenyu Tang
Mingjun Pan
Xing Zhou
Li-ming Yuan
MLLM
40
6
0
22 Feb 2024
Quantized Embedding Vectors for Controllable Diffusion Language Models
Cheng Kang
Xinye Chen
Yong Hu
Daniel Novak
31
0
0
15 Feb 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
Amphion: An Open-Source Audio, Music and Speech Generation Toolkit
Xueyao Zhang
Liumeng Xue
Yicheng Gu
Yuancheng Wang
Haorui He
...
Mingxuan Wang
Jun Han
Kai Chen
Haizhou Li
Zhizheng Wu
29
26
0
15 Dec 2023
miditok: A Python package for MIDI file tokenization
Nathan Fradet
Jean-Pierre Briot
F. Chhel
A. E. Seghrouchni
Nicolas Gutowski
32
39
0
26 Oct 2023
NADiffuSE: Noise-aware Diffusion-based Model for Speech Enhancement
Wen Wang
Dongchao Yang
Qichen Ye
Bowen Cao
Yuexian Zou
DiffM
37
3
0
03 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin
Tao Li
Chenxu Hu
Jian Cong
Xinfa Zhu
Jingbei Li
Qiao Tian
Yuping Wang
Linfu Xie
DiffM
38
8
0
02 Sep 2023
AI-Generated Content (AIGC) for Various Data Modalities: A Survey
Lin Geng Foo
Hossein Rahmani
Xiaozhong Liu
78
31
0
27 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
30
223
0
10 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
70
37
0
09 Aug 2023
Squeezing Large-Scale Diffusion Models for Mobile
Jiwoong Choi
Minkyu Kim
Daehyun Ahn
Taesu Kim
Yulhwa Kim
Do-Hyun Jo
H. Jeon
Jae-Joon Kim
Hyungjun Kim
31
9
0
03 Jul 2023
Confronting Ambiguity in 6D Object Pose Estimation via Score-Based Diffusion on SE(3)
Tsu-Ching Hsiao
Haoming Chen
Hsuan-Kung Yang
Chun-Yi Lee
DiffM
23
7
0
25 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
20
17
0
22 May 2023
DiffUCD:Unsupervised Hyperspectral Image Change Detection with Semantic Correlation Diffusion Model
Xiangrong Zhang
Shunli Tian
Guanchun Wang
Huiyu Zhou
Licheng Jiao
DiffM
50
6
0
21 May 2023
A-CAP: Anticipation Captioning with Commonsense Knowledge
D. Vo
Quoc-An Luong
Akihiro Sugimoto
Hideki Nakayama
27
2
0
13 Apr 2023
DiffTAD: Temporal Action Detection with Proposal Denoising Diffusion
Sauradip Nag
Xiatian Zhu
Jiankang Deng
Yi-Zhe Song
Tao Xiang
DiffM
VGen
41
21
0
27 Mar 2023
PDPP: Projected Diffusion for Procedure Planning in Instructional Videos
Hanlin Wang
Yilu Wu
Sheng Guo
Limin Wang
VGen
DiffM
73
30
0
26 Mar 2023
DLT: Conditioned layout generation with Joint Discrete-Continuous Diffusion Layout Transformer
Elad Levi
Eli Brosh
Mykola Mykhailych
Meir Perez
DiffM
53
16
0
07 Mar 2023
Can We Use Diffusion Probabilistic Models for 3D Motion Prediction?
Hyemin Ahn
Esteve Valls Mascaro
Dongheui Lee
VGen
DiffM
16
22
0
28 Feb 2023
DIFUSCO: Graph-based Diffusion Solvers for Combinatorial Optimization
Zhiqing Sun
Yiming Yang
DiffM
33
118
0
16 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
DiffusionDet: Diffusion Model for Object Detection
Shoufa Chen
Pei Sun
Yibing Song
Ping Luo
63
443
0
17 Nov 2022
1
2
Next