Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.06337
Cited By
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
13 May 2021
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"
50 / 352 papers shown
Title
VoiceCloak: A Multi-Dimensional Defense Framework against Unauthorized Diffusion-based Voice Cloning
Qianyue Hu
Junyan Wu
Wei Lu
Xiangyang Luo
DiffM
AAML
2
0
0
18 May 2025
Language translation, and change of accent for speech-to-speech task using diffusion model
Abhishek Mishra
Ritesh Sur Chowdhury
Vartul Bahuguna
Isha Pandey
Ganesh Ramakrishnan
DiffM
44
0
0
04 May 2025
Voice Cloning: Comprehensive Survey
Hussam Azzuni
Abdulmotaleb El Saddik
VLM
44
0
0
01 May 2025
TriniMark: A Robust Generative Speech Watermarking Method for Trinity-Level Attribution
Yue Li
Wei Liu
Dongdong Lin
44
0
0
29 Apr 2025
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Yatong Bai
Jonah Casebeer
Somayeh Sojoudi
Nicholas J. Bryan
DiffM
VLM
48
1
0
21 Apr 2025
Generalized Audio Deepfake Detection Using Frame-level Latent Information Entropy
Botao Zhao
Zuheng Kang
Yayun He
Xiaoyang Qu
Junqing Peng
Jing Xiao
Jianzong Wang
23
0
0
15 Apr 2025
SD-ReID: View-aware Stable Diffusion for Aerial-Ground Person Re-Identification
Xiang Hu
Pingping Zhang
Yuhao Wang
Bin Yan
Huchuan Lu
25
0
0
13 Apr 2025
On the Design of Diffusion-based Neural Speech Codecs
Pietro Foti
Andreas Brendel
DiffM
39
0
0
11 Apr 2025
P2Mark: Plug-and-play Parameter-level Watermarking for Neural Speech Generation
Yong Ren
Jiangyan Yi
Tao Wang
J. Tao
Zhengqi Wen
Chenxing Li
Zheng Lian
Ruibo Fu
Ye Bai
Xiaohui Zhang
58
0
0
07 Apr 2025
SupertonicTTS: Towards Highly Scalable and Efficient Text-to-Speech System
Hyeongju Kim
Jinhyeok Yang
Yechan Yu
Seunghun Ji
Jacob Morton
Frederik Bous
Joon Byun
Juheon Lee
51
0
0
29 Mar 2025
Dual Audio-Centric Modality Coupling for Talking Head Generation
Ao Fu
Ziqi Ni
Yi Zhou
37
1
0
26 Mar 2025
From Faces to Voices: Learning Hierarchical Representations for High-quality Video-to-Speech
Ji-Hoon Kim
Jeongsoo Choi
Jaehun Kim
Chaeyoung Jung
Joon Son Chung
CVBM
53
1
0
21 Mar 2025
Serenade: A Singing Style Conversion Framework Based On Audio Infilling
Lester Phillip Violeta
Wen-Chin Huang
T. Toda
37
0
0
16 Mar 2025
Prosody-Enhanced Acoustic Pre-training and Acoustic-Disentangled Prosody Adapting for Movie Dubbing
Zhedong Zhang
Liang-Sheng Li
C. Yan
Chunshan Liu
Anton Van Den Hengel
Yuankai Qi
91
2
0
15 Mar 2025
MAVFlow: Preserving Paralinguistic Elements with Conditional Flow Matching for Zero-Shot AV2AV Multilingual Translation
Sungwoo Cho
J. Choi
Sungnyun Kim
Se-Young Yun
63
0
0
14 Mar 2025
On the Generalization Properties of Diffusion Models
Puheng Li
Zhong Li
Huishuai Zhang
Jiang Bian
74
29
0
13 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Y. Guo
67
3
0
13 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
61
0
0
11 Mar 2025
MM-StoryAgent: Immersive Narrated Storybook Video Generation with a Multi-Agent Paradigm across Text, Image and Audio
Xuenan Xu
Jiahao Mei
Chenliang Li
Yuning Wu
M. Yan
Shaopeng Lai
J.N. Zhang
Mengyue Wu
VGen
LLMAG
44
1
0
07 Mar 2025
A Dual-Purpose Framework for Backdoor Defense and Backdoor Amplification in Diffusion Models
Vu Tuan Truong Long
Bao Le
DiffM
AAML
213
0
0
26 Feb 2025
Everyday Speech in the Indian Subcontinent
Utkarsh Pathak
56
1
0
24 Feb 2025
DMOSpeech: Direct Metric Optimization via Distilled Diffusion Model in Zero-Shot Speech Synthesis
Yingahao Aaron Li
Rithesh Kumar
Zeyu Jin
DiffM
98
0
0
21 Feb 2025
RestoreGrad: Signal Restoration Using Conditional Denoising Diffusion Models with Jointly Learned Prior
Ching Hua Lee
Chouchang Yang
Jaejin Cho
Yashas Malur Saidutta
R. S. Srinivasa
Yilin Shen
Hongxia Jin
DiffM
88
0
0
19 Feb 2025
Diffusion-Sharpening: Fine-tuning Diffusion Models with Denoising Trajectory Sharpening
Ye Tian
L. Yang
Xinchen Zhang
Yunhai Tong
Mengdi Wang
Bin Cui
67
1
0
17 Feb 2025
BackdoorDM: A Comprehensive Benchmark for Backdoor Learning in Diffusion Model
Weilin Lin
Nanjun Zhou
Yijiao Wang
Jianze Li
Hui Xiong
Li Liu
AAML
DiffM
184
0
0
17 Feb 2025
Less is More for Synthetic Speech Detection in the Wild
Ashi Garg
Zexin Cai
Henry Li Xinyuan
Leibny Paola García-Perera
Kevin Duh
Sanjeev Khudanpur
Matthew Wiesner
Nicholas Andrews
74
0
0
17 Feb 2025
DiffGraph: Heterogeneous Graph Diffusion Model
Zongwei Li
Lianghao Xia
Hua Hua
Shijie Zhang
Shuangyang Wang
Chenyu Huang
41
0
0
04 Jan 2025
Cached Adaptive Token Merging: Dynamic Token Reduction and Redundant Computation Elimination in Diffusion Model
Omid Saghatchian
Atiyeh Gh. Moghadam
Ahmad Nickabadi
MoMe
49
1
0
03 Jan 2025
Face-StyleSpeech: Enhancing Zero-shot Speech Synthesis from Face Images with Improved Face-to-Speech Mapping
Minki Kang
Wooseok Han
Eunho Yang
CVBM
39
0
0
31 Dec 2024
EmoReg: Directional Latent Vector Modeling for Emotional Intensity Regularization in Diffusion-based Voice Conversion
Ashishkumar Gudmalwar
Ishan D. Biyani
Nirmesh J. Shah
Pankaj Wasnik
R. Shah
DiffM
26
0
0
31 Dec 2024
Stable-TTS: Stable Speaker-Adaptive Text-to-Speech Synthesis via Prosody Prompting
Wooseok Han
Minki Kang
Changhun Kim
Eunho Yang
40
0
0
31 Dec 2024
UIBDiffusion: Universal Imperceptible Backdoor Attack for Diffusion Models
Yuning Han
Bingyin Zhao
Rui Chu
Feng Luo
Biplab Sikdar
Yingjie Lao
DiffM
AAML
86
1
0
16 Dec 2024
SoftVQ-VAE: Efficient 1-Dimensional Continuous Tokenizer
Hongyu Chen
Zihan Wang
Xianrui Li
Xingchen Sun
Fangyi Chen
Jiang Liu
Jiadong Wang
Bhiksha Raj
Zicheng Liu
Emad Barsoum
VLM
114
7
0
14 Dec 2024
DiffStyleTTS: Diffusion-based Hierarchical Prosody Modeling for Text-to-Speech with Diverse and Controllable Styles
Jiaxuan Liu
Zhaoci Liu
Yihan Hu
Yingying Gao
Shilei Zhang
Zhenhua Ling
DiffM
83
2
0
04 Dec 2024
Enhancing Diffusion Posterior Sampling for Inverse Problems by Integrating Crafted Measurements
Shijie Zhou
Huaisheng Zhu
Rohan Sharma
R. Zhang
Kaiyi Ji
Changyou Chen
34
0
0
15 Nov 2024
Robust AI-Synthesized Speech Detection Using Feature Decomposition Learning and Synthesizer Feature Augmentation
Kuiyuan Zhang
Zhongyun Hua
Yushu Zhang
Yifang Guo
Tao Xiang
29
0
0
14 Nov 2024
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector
Deok-Hyeon Cho
Hyung-Seok Oh
Seung-Bin Kim
Seong-Whan Lee
46
4
0
04 Nov 2024
TPC: Test-time Procrustes Calibration for Diffusion-based Human Image Animation
Sunjae Yoon
Gwanhyeong Koo
Younghwan Lee
Chang D. Yoo
VGen
74
3
0
31 Oct 2024
RDSinger: Reference-based Diffusion Network for Singing Voice Synthesis
Kehan Sui
Jinxu Xiang
Fang Jin
DiffM
24
0
0
29 Oct 2024
One-Step Diffusion Policy: Fast Visuomotor Policies via Diffusion Distillation
Zhendong Wang
Zhiyu Li
Ajay Mandlekar
Zhenjia Xu
Jiaojiao Fan
...
Yuke Zhu
Yogesh Balaji
Mingyuan Zhou
Xuan Li
Yu Zeng
37
16
0
28 Oct 2024
Utilizing Image Transforms and Diffusion Models for Generative Modeling of Short and Long Time Series
Ilan Naiman
Nimrod Berman
Itai Pemper
Idan Arbiv
Gal Fadlon
Omri Azencot
32
11
0
25 Oct 2024
Making Social Platforms Accessible: Emotion-Aware Speech Generation with Integrated Text Analysis
Suparna De
Ionut Bostan
Nishanth Sastry
34
0
0
24 Oct 2024
Enhancing Low-Resource ASR through Versatile TTS: Bridging the Data Gap
Guanrou Yang
Fan Yu
Z. Ma
Zhihao Du
Zhifu Gao
Shiliang Zhang
Xie Chen
32
1
0
22 Oct 2024
Accelerating Codec-based Speech Synthesis with Multi-Token Prediction and Speculative Decoding
Tan Dat Nguyen
Ji-Hoon Kim
Jeongsoo Choi
Shukjae Choi
Jinseok Park
Younglo Lee
Joon Son Chung
37
0
0
17 Oct 2024
Enhancing Crowdsourced Audio for Text-to-Speech Models
José Giraldo
Martí Llopart-Font
Alex Peiró-Lilja
Carme Armentano-Oller
Gerard Sant
Baybars Külebi
DiffM
26
0
0
17 Oct 2024
Off-dynamics Conditional Diffusion Planners
Wen Zheng Terence Ng
Jianda Chen
Tianwei Zhang
DiffM
OffRL
35
0
0
16 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
47
2
0
16 Oct 2024
Diffuse or Confuse: A Diffusion Deepfake Speech Dataset
Anton Firc
K. Malinka
P. Hanáček
DiffM
36
0
0
09 Oct 2024
SCOREQ: Speech Quality Assessment with Contrastive Regression
Alessandro Ragano
Jan Skoglund
Andrew Hines
40
6
0
09 Oct 2024
Bahasa Harmony: A Comprehensive Dataset for Bahasa Text-to-Speech Synthesis with Discrete Codec Modeling of EnGen-TTS
Onkar Kishor Susladkar
Vishesh Tripathi
Biddwan Ahmed
23
0
0
09 Oct 2024
1
2
3
4
5
6
7
8
Next