Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2104.09995
Cited By
Review of end-to-end speech synthesis technology based on deep learning
20 April 2021
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Review of end-to-end speech synthesis technology based on deep learning"
50 / 51 papers shown
Title
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
Brooke Stephenson
Thomas Hueber
Laurent Girin
Laurent Besacier
54
10
0
19 Feb 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
42
73
0
17 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
38
52
0
11 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
61
219
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
56
103
0
22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
160
1,918
0
12 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
97
1,429
0
21 Sep 2020
SpeedySpeech: Efficient Neural Speech Synthesis
Jan Vainer
Ondrej Dusek
41
42
0
09 Aug 2020
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Tomás Nekvinda
Ondrej Dusek
55
57
0
03 Aug 2020
Improved Techniques for Training Score-Based Generative Models
Yang Song
Stefano Ermon
DiffM
175
1,135
0
16 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Lañcucki
66
339
0
11 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
103
1,382
0
08 Jun 2020
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
60
186
0
05 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
79
489
0
22 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
109
199
0
11 May 2020
Unsupervised Speech Decomposition via Triple Information Bottleneck
Kaizhi Qian
Yang Zhang
Shiyu Chang
David D. Cox
M. Hasegawa-Johnson
59
178
0
23 Apr 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
51
56
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
40
130
0
06 Feb 2020
WaveFlow: A Compact Flow-based Model for Raw Audio
Ming-Yu Liu
Kainan Peng
Kexin Zhao
Z. Song
71
117
0
03 Dec 2019
Emotional speech synthesis with rich and granularized control
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
38
89
0
05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
60
149
0
26 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
75
1,847
0
23 Sep 2019
Maximizing Mutual Information for Tacotron
Peng Liu
Xixin Wu
Shiyin Kang
Guangzhi Li
Dan Su
Dong Yu
46
16
0
30 Aug 2019
Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion
Joan Serrà
Santiago Pascual
Carlos Segura
CVBM
55
84
0
03 Jun 2019
Sliced Score Matching: A Scalable Approach to Density and Score Estimation
Yang Song
Sahaj Garg
Jiaxin Shi
Stefano Ermon
78
409
0
17 May 2019
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
64
461
0
14 May 2019
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Tao Tu
Yuan-Jui Chen
Cheng-chieh Yeh
Hung-yi Lee
41
87
0
13 Apr 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
80
933
0
05 Apr 2019
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
118
649
0
05 Apr 2019
Learning latent representations for style control and transfer in end-to-end speech synthesis
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDL
SSL
DRL
46
228
0
11 Dec 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R. Prenger
Rafael Valle
Bryan Catanzaro
144
1,024
0
31 Oct 2018
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
J. Valin
Jan Skoglund
59
450
0
28 Oct 2018
Meta-Learning for Low-Resource Neural Machine Translation
Jiatao Gu
Yong Wang
Yun Chen
Kyunghyun Cho
Victor O.K. Li
74
342
0
25 Aug 2018
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma
Prafulla Dhariwal
BDL
DRL
240
3,110
0
09 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
249
826
0
12 Jun 2018
Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra
S. Sakti
Satoshi Nakamura
58
55
0
28 Mar 2018
Demystifying MMD GANs
Mikolaj Binkowski
Danica J. Sutherland
Michael Arbel
Arthur Gretton
EGVM
102
1,478
0
04 Jan 2018
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Aaron van den Oord
Yazhe Li
Igor Babuschkin
Karen Simonyan
Oriol Vinyals
...
Alex Graves
Helen King
T. Walters
Dan Belov
Demis Hassabis
175
858
0
28 Nov 2017
Listening while Speaking: Speech Chain by Deep Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AuLLM
147
165
0
16 Jul 2017
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
150
1,817
0
29 Mar 2017
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Soroush Mehri
Kundan Kumar
Ishaan Gulrajani
Rithesh Kumar
Shubham Jain
Jose M. R. Sotelo
Aaron Courville
Yoshua Bengio
88
597
0
22 Dec 2016
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
333
7,361
0
12 Sep 2016
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Heiga Zen
Yannis Agiomyrgiannakis
Niels Egberts
Fergus Henderson
Przemyslaw Szczepaniak
41
118
0
20 Jun 2016
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Justin Johnson
Alexandre Alahi
Li Fei-Fei
SupR
201
10,202
0
27 Mar 2016
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
419
2,563
0
25 Jan 2016
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
174
7,683
0
31 Aug 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
298
10,034
0
10 Feb 2015
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
374
12,662
0
11 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
422
27,205
0
01 Sep 2014
Auto-Encoding Variational Bayes
Diederik P. Kingma
Max Welling
BDL
395
16,962
0
20 Dec 2013
1
2
Next