Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.05957
Cited By
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
12 May 2020
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis"
37 / 37 papers shown
Title
Muyan-TTS: A Trainable Text-to-Speech Model Optimized for Podcast Scenarios with a
50
K
B
u
d
g
e
t
50K Budget
50
K
B
u
d
g
e
t
Xin Li
Kaikai Jia
Hao Sun
Jun Dai
Z. L. Jiang
131
0
0
27 Apr 2025
Overview of Speaker Modeling and Its Applications: From the Lens of Deep Speaker Representation Learning
Shuai Wang
Zheng-Shou Chen
Kong Aik Lee
Yan-min Qian
Haizhou Li
39
4
0
21 Jul 2024
Improving Robustness of LLM-based Speech Synthesis by Learning Monotonic Alignment
Paarth Neekhara
Shehzeen Samarah Hussain
Subhankar Ghosh
Jason Chun Lok Li
Rafael Valle
Rohan Badlani
Boris Ginsburg
55
11
0
25 Jun 2024
Cognitively Inspired Energy-Based World Models
Alexi Gladstone
Ganesh Nanduru
Md. Mofijul Islam
Aman Chadha
Jundong Li
Tariq Iqbal
39
0
0
13 Jun 2024
Speaking in Wavelet Domain: A Simple and Efficient Approach to Speed up Speech Diffusion Model
Xiangyu Zhang
Daijiao Liu
Hexin Liu
Qiquan Zhang
Hanyu Meng
Leibny Paola García
Chng Eng Siong
Lina Yao
DiffM
25
2
0
16 Feb 2024
Data-driven grapheme-to-phoneme representations for a lexicon-free text-to-speech
Abhinav Garg
Jiyeon Kim
Sushil Khyalia
Chanwoo Kim
Dhananjaya N. Gowda
25
2
0
19 Jan 2024
Creating New Voices using Normalizing Flows
Piotr Bilinski
Thomas Merritt
Abdelhamid Ezzerg
Kamil Pokora
Sebastian Cygert
K. Yanagisawa
Roberto Barra-Chicote
Daniel Korzekwa
20
17
0
22 Dec 2023
DPP-TTS: Diversifying prosodic features of speech via determinantal point processes
Seongho Joo
Hyukhun Koh
Kyomin Jung
DiffM
47
0
0
23 Oct 2023
VITS2: Improving Quality and Efficiency of Single-Stage Text-to-Speech with Adversarial Learning and Architecture Design
Jungil Kong
Jihoon Park
Beomjeong Kim
Jeongmin Kim
Dohee Kong
Sangjin Kim
37
35
0
31 Jul 2023
LibriTTS-R: A Restored Multi-Speaker Text-to-Speech Corpus
Yuma Koizumi
Heiga Zen
Shigeki Karita
Yifan Ding
Kohei Yatabe
Nobuyuki Morioka
M. Bacchiani
Yu Zhang
Wei Han
Ankur Bapna
43
66
0
30 May 2023
Stochastic Pitch Prediction Improves the Diversity and Naturalness of Speech in Glow-TTS
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
DiffM
29
4
0
28 May 2023
U-DiT TTS: U-Diffusion Vision Transformer for Text-to-Speech
Xin Jing
Yi Chang
Zijiang Yang
Jiang-jian Xie
Andreas Triantafyllopoulos
Bjoern W. Schuller
41
10
0
22 May 2023
AI-Synthesized Voice Detection Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Siwei Lyu
32
38
0
25 Apr 2023
Affective social anthropomorphic intelligent system
Md. Adyelullahil Mamun
Hasnat Md. Abdullah
Md. Golam Rabiul Alam
Muhammad Mehedi Hassan
Md. Zia Uddin
17
1
0
19 Apr 2023
Do Prosody Transfer Models Transfer Prosody?
A. Sigurgeirsson
Simon King
DiffM
4
7
0
07 Mar 2023
An investigation into the adaptability of a diffusion-based TTS model
Haolin Chen
Philip N. Garner
DiffM
36
1
0
03 Mar 2023
Varianceflow: High-Quality and Controllable Text-to-Speech using Variance Information via Normalizing Flow
Yoonhyung Lee
Jinhyeok Yang
Kyomin Jung
22
6
0
27 Feb 2023
Exposing AI-Synthesized Human Voices Using Neural Vocoder Artifacts
Chengzhe Sun
Shan Jia
Shuwei Hou
Ehab AlBadawy
Siwei Lyu
130
3
0
18 Feb 2023
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
39
12
0
13 Nov 2022
Cover Reproducible Steganography via Deep Generative Models
Kejiang Chen
Hang Zhou
Yaofei Wang
Meng Li
Weiming Zhang
Neng H. Yu
DiffM
28
9
0
26 Oct 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings
M. Rahman
M. Graciarena
Diego Castán
Chris Cobo-Kroenke
Mitchell McLaren
A. Lawson
AAML
25
9
0
15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi
Lloyd May
J. Berger
25
2
0
06 Sep 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
36
38
0
30 May 2022
Distribution augmentation for low-resource expressive text-to-speech
Mateusz Lajszczak
Animesh Prasad
Arent van Korlaar
Bajibabu Bollepalli
A. Bonafonte
...
M. Nicolis
Alexis Moinet
Thomas Drugman
Trevor Wood
Elena Sokolova
30
7
0
13 Feb 2022
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
23
2
0
06 Oct 2021
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
37
19
0
25 Aug 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
77
842
0
11 Jun 2021
Giving Commands to a Self-Driving Car: How to Deal with Uncertain Situations?
Thierry Deruyttere
Victor Milewski
Marie-Francine Moens
30
15
0
08 Jun 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
28
12
0
05 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
36
22
0
12 Feb 2021
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
21
97
0
06 Nov 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
34
1,392
0
21 Sep 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Lañcucki
27
332
0
11 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
31
473
0
22 May 2020
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
232
239
0
25 Sep 2019
1