Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13195
Cited By
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
22 May 2023
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
Re-assign community
ArXiv
PDF
HTML
Papers citing
"U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech"
16 / 16 papers shown
Title
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
110
2
0
16 Oct 2024
ResGrad: Residual Denoising Diffusion Probabilistic Models for Text to Speech
Ze Chen
Yihan Wu
Yichong Leng
Jiawei Chen
Haohe Liu
...
Ke Wang
Lei He
Sheng Zhao
Jiang Bian
Danilo Mandic
DiffM
66
23
0
30 Dec 2022
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
77
2,275
0
19 Dec 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
57
53
0
06 Oct 2022
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
92
532
0
13 May 2021
Diffusion Models Beat GANs on Image Synthesis
Prafulla Dhariwal
Alex Nichol
178
7,765
0
11 May 2021
Symbolic Music Generation with Diffusion Models
Gautam Mittal
Jesse Engel
Curtis Hawthorne
Ian Simon
MGen
DiffM
78
193
0
30 Mar 2021
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song
Jascha Narain Sohl-Dickstein
Diederik P. Kingma
Abhishek Kumar
Stefano Ermon
Ben Poole
DiffM
SyDa
292
6,409
0
26 Nov 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
162
1,928
0
12 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
105
1,393
0
08 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
81
491
0
22 May 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
168
1,074
0
21 Dec 2019
Generative Modeling by Estimating Gradients of the Data Distribution
Yang Song
Stefano Ermon
SyDa
DiffM
213
3,870
0
12 Jul 2019
Group Normalization
Yuxin Wu
Kaiming He
198
3,644
0
22 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
77
2,694
0
16 Dec 2017
Deep Unsupervised Learning using Nonequilibrium Thermodynamics
Jascha Narain Sohl-Dickstein
Eric A. Weiss
Niru Maheswaranathan
Surya Ganguli
SyDa
DiffM
255
6,887
0
12 Mar 2015
1