Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2105.06337
Cited By
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
13 May 2021
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"
50 / 352 papers shown
Title
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
46
7
0
07 Jun 2024
RecDiff: Diffusion Model for Social Recommendation
Zongwei Li
Lianghao Xia
Chao Huang
42
14
0
01 Jun 2024
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
56
0
0
31 May 2024
Towards Black-Box Membership Inference Attack for Diffusion Models
Jingwei Li
Jingyi Dong
Tianxing He
Jingzhao Zhang
38
3
0
25 May 2024
DiffNorm: Self-Supervised Normalization for Non-autoregressive Speech-to-speech Translation
Weiting Tan
Jingyu Zhang
Lingfeng Shen
Daniel Khashabi
Philipp Koehn
32
0
0
22 May 2024
Faces that Speak: Jointly Synthesising Talking Face and Speech from Text
Youngjoon Jang
Ji-Hoon Kim
Junseok Ahn
Doyeop Kwak
Hong-Sun Yang
Yooncheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
CVBM
31
9
0
16 May 2024
CTS: A Consistency-Based Medical Image Segmentation Model
Kejia Zhang
Lan Zhang
Haiwei Pan
Baolong Yu
MedIm
DiffM
40
1
0
15 May 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
47
2
0
13 May 2024
Diff-ETS: Learning a Diffusion Probabilistic Model for Electromyography-to-Speech Conversion
Zhao Ren
Kevin Scheck
Qinhan Hou
Stefano van Gogh
Michael Wand
Tanja Schultz
DiffM
36
0
0
11 May 2024
Shape Conditioned Human Motion Generation with Diffusion Model
Kebing Xue
Hyewon Seo
DiffM
35
2
0
10 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
47
15
0
08 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
35
18
0
30 Apr 2024
Attention-Constrained Inference for Robust Decoder-Only Text-to-Speech
Hankun Wang
Chenpeng Du
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
40
1
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
46
4
0
30 Apr 2024
USAT: A Universal Speaker-Adaptive Text-to-Speech Approach
Wenbin Wang
Yang Song
Sanjay Jha
42
10
0
28 Apr 2024
FlashSpeech: Efficient Zero-Shot Speech Synthesis
Zhen Ye
Zeqian Ju
Haohe Liu
Xu Tan
Jianyi Chen
...
Weizhen Bian
Shulin He
Qi-fei Liu
Yi-Ting Guo
Wei Xue
38
16
0
23 Apr 2024
RF-Diffusion: Radio Signal Generation via Time-Frequency Diffusion
Guoxuan Chi
Zheng Yang
Chenshu Wu
Jingao Xu
Yuchong Gao
Yunhao Liu
Tony Xiao Han
DiffM
54
29
0
14 Apr 2024
Applying Guidance in a Limited Interval Improves Sample and Distribution Quality in Diffusion Models
Tuomas Kynkaanniemi
M. Aittala
Tero Karras
S. Laine
Timo Aila
J. Lehtinen
19
58
0
11 Apr 2024
ConsistencyDet: A Few-step Denoising Framework for Object Detection Using the Consistency Model
Lifan Jiang
Zhihui Wang
Changmiao Wang
Ming Li
Jiaxu Leng
DiffM
28
0
0
11 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
34
1
0
10 Apr 2024
CM-TTS: Enhancing Real Time Text-to-Speech Synthesis Efficiency through Weighted Samplers and Consistency Models
Xiang Li
Fan Bu
Ambuj Mehrish
Yingting Li
Jiale Han
Bo Cheng
Soujanya Poria
DiffM
34
6
0
31 Mar 2024
Improving Diffusion Models's Data-Corruption Resistance using Scheduled Pseudo-Huber Loss
Artem Khrapov
Vadim Popov
Tasnima Sadekova
Assel Yermekova
Mikhail Kudinov
DiffM
41
1
0
25 Mar 2024
GetMesh: A Controllable Model for High-quality Mesh Generation and Manipulation
Zhaoyang Lyu
Ben Fei
Jinyi Wang
Xudong Xu
Ya Zhang
Weidong Yang
Bo Dai
26
4
0
18 Mar 2024
Tackling the Singularities at the Endpoints of Time Intervals in Diffusion Models
Pengze Zhang
Hubery Yin
Chen Li
Xiaohua Xie
40
5
0
13 Mar 2024
EM-TTS: Efficiently Trained Low-Resource Mongolian Lightweight Text-to-Speech
Ziqi Liang
Haoxiang Shi
Jiawei Wang
Keda Lu
43
0
0
13 Mar 2024
HAM-TTS: Hierarchical Acoustic Modeling for Token-Based Zero-Shot Text-to-Speech with Model and Data Scaling
Chunhui Wang
Chang Zeng
Bowen Zhang
Ziyang Ma
Yefan Zhu
Zifeng Cai
Jian Zhao
Zhonglin Jiang
Yong Chen
SyDa
44
5
0
09 Mar 2024
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
49
144
0
05 Mar 2024
Trajectory Consistency Distillation: Improved Latent Consistency Distillation by Semi-Linear Consistency Function with Trajectory Mapping
Jianbin Zheng
Minghui Hu
Zhongyi Fan
Chaoyue Wang
Changxing Ding
Dacheng Tao
Tat-Jen Cham
43
26
0
29 Feb 2024
Structure-Guided Adversarial Training of Diffusion Models
Ling Yang
Haotian Qian
Zhilong Zhang
Jingwei Liu
Bin Cui
25
10
0
27 Feb 2024
Contextualized Diffusion Models for Text-Guided Image and Video Generation
Ling Yang
Zhilong Zhang
Zhaochen Yu
Jingwei Liu
Minkai Xu
Stefano Ermon
Bin Cui
44
4
0
26 Feb 2024
An Automated End-to-End Open-Source Software for High-Quality Text-to-Speech Dataset Generation
Ahmet Gunduz
K. Yuksel
Kareem Darwish
Golara Javadi
Fabio Minazzi
Nicola Sobieski
Sebastien Bratieres
25
0
0
26 Feb 2024
Daisy-TTS: Simulating Wider Spectrum of Emotions via Prosody Embedding Decomposition
Rendi Chevi
Alham Fikri Aji
25
2
0
22 Feb 2024
Compression Robust Synthetic Speech Detection Using Patched Spectrogram Transformer
Amit Kumar Singh Yadav
Ziyue Xiang
Kratika Bhagtani
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
ViT
49
2
0
22 Feb 2024
SingVisio: Visual Analytics of Diffusion Model for Singing Voice Conversion
Liumeng Xue
Chaoren Wang
Mingxuan Wang
Xueyao Zhang
Jun Han
Zhizheng Wu
DiffM
32
5
0
20 Feb 2024
On the Semantic Latent Space of Diffusion-Based Text-to-Speech Models
Miri Varshavsky-Hassid
Roy Hirsch
Regev Cohen
Tomer Golany
Daniel Freedman
Ehud Rivlin
28
3
0
19 Feb 2024
DiffPoint: Single and Multi-view Point Cloud Reconstruction with ViT Based Diffusion Model
Yu Feng
Xing Shi
Mengli Cheng
Yun Xiong
19
0
0
17 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
38
72
0
12 Feb 2024
Towards a mathematical theory for consistency training in diffusion models
Gen Li
Zhihan Huang
Yuting Wei
69
16
0
12 Feb 2024
Bringing Generative AI to Adaptive Learning in Education
Hang Li
Tianlong Xu
Chaoli Zhang
Eason Chen
Jing Liang
Xing Fan
Haoyang Li
Jiliang Tang
Qingsong Wen
48
20
0
02 Feb 2024
Convergence Analysis for General Probability Flow ODEs of Diffusion Models in Wasserstein Distances
Xuefeng Gao
Lingjiong Zhu
38
20
0
31 Jan 2024
Topology-Aware Latent Diffusion for 3D Shape Generation
Jiangbei Hu
Ben Fei
Baixin Xu
Fei Hou
Weidong Yang
Shengfa Wang
Na Lei
Chen Qian
Ying He
40
7
0
31 Jan 2024
VALL-T: Decoder-Only Generative Transducer for Robust and Decoding-Controllable Text-to-Speech
Chenpeng Du
Yiwei Guo
Hankun Wang
Yifan Yang
Zhikang Niu
Shuai Wang
Hui Zhang
Xie Chen
Kai Yu
VLM
30
25
0
25 Jan 2024
FreGrad: Lightweight and Fast Frequency-aware Diffusion Vocoder
Tan Dat Nguyen
Ji-Hoon Kim
Youngjoon Jang
Jaehun Kim
Joon Son Chung
DiffM
41
5
0
18 Jan 2024
UniVG: Towards UNIfied-modal Video Generation
Ludan Ruan
Lei Tian
Chuanwei Huang
Xu Zhang
Xinyan Xiao
VGen
DiffM
34
3
0
17 Jan 2024
ED-TTS: Multi-Scale Emotion Modeling using Cross-Domain Emotion Diarization for Emotional Speech Synthesis
Haobin Tang
Xulong Zhang
Ning Cheng
Jing Xiao
Jianzong Wang
21
11
0
16 Jan 2024
DurFlex-EVC: Duration-Flexible Emotional Voice Conversion Leveraging Discrete Representations without Text Alignment
Hyoung-Seok Oh
Sang-Hoon Lee
Deok-Hyun Cho
Seong-Whan Lee
52
1
0
16 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
19
36
0
14 Jan 2024
SonicVisionLM: Playing Sound with Vision Language Models
Zhifeng Xie
Shengye Yu
Qile He
Mengtian Li
VLM
VGen
28
2
0
09 Jan 2024
DiffusionEdge: Diffusion Probabilistic Model for Crisp Edge Detection
Yunfan Ye
K. Xu
Yuhang Huang
Renjiao Yi
Zhiping Cai
34
35
0
04 Jan 2024
Utilizing Neural Transducers for Two-Stage Text-to-Speech via Semantic Token Prediction
Minchan Kim
Myeonghun Jeong
Byoung Jin Choi
Semin Kim
Joun Yeop Lee
Nam Soo Kim
AI4TS
25
4
0
03 Jan 2024
Previous
1
2
3
4
5
6
7
8
Next