ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.09995
  4. Cited By
Review of end-to-end speech synthesis technology based on deep learning

Review of end-to-end speech synthesis technology based on deep learning

20 April 2021
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
    AuLLM
    ALM
ArXivPDFHTML

Papers citing "Review of end-to-end speech synthesis technology based on deep learning"

50 / 51 papers shown
Title
Alternate Endings: Improving Prosody for Incremental Neural TTS with
  Predicted Future Text Input
Alternate Endings: Improving Prosody for Incremental Neural TTS with Predicted Future Text Input
Brooke Stephenson
Thomas Hueber
Laurent Girin
Laurent Besacier
54
10
0
19 Feb 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
42
73
0
17 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
38
52
0
11 Nov 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
61
219
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
56
103
0
22 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
160
1,918
0
12 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
97
1,429
0
21 Sep 2020
SpeedySpeech: Efficient Neural Speech Synthesis
SpeedySpeech: Efficient Neural Speech Synthesis
Jan Vainer
Ondrej Dusek
41
42
0
09 Aug 2020
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
One Model, Many Languages: Meta-learning for Multilingual Text-to-Speech
Tomás Nekvinda
Ondrej Dusek
55
57
0
03 Aug 2020
Improved Techniques for Training Score-Based Generative Models
Improved Techniques for Training Score-Based Generative Models
Yang Song
Stefano Ermon
DiffM
175
1,135
0
16 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Lañcucki
66
339
0
11 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
103
1,382
0
08 Jun 2020
End-to-End Adversarial Text-to-Speech
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
60
186
0
05 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
79
489
0
22 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality
  Text-to-Speech
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
109
199
0
11 May 2020
Unsupervised Speech Decomposition via Triple Information Bottleneck
Unsupervised Speech Decomposition via Triple Information Bottleneck
Kaizhi Qian
Yang Zhang
Shiyu Chang
David D. Cox
M. Hasegawa-Johnson
59
178
0
23 Apr 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
51
56
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
40
130
0
06 Feb 2020
WaveFlow: A Compact Flow-based Model for Raw Audio
WaveFlow: A Compact Flow-based Model for Raw Audio
Ming-Yu Liu
Kainan Peng
Kexin Zhao
Z. Song
71
117
0
03 Dec 2019
Emotional speech synthesis with rich and granularized control
Emotional speech synthesis with rich and granularized control
Seyun Um
Sangshin Oh
Kyungguen Byun
Inseon Jang
C. Ahn
Hong-Goo Kang
38
89
0
05 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
60
149
0
26 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
75
1,847
0
23 Sep 2019
Maximizing Mutual Information for Tacotron
Maximizing Mutual Information for Tacotron
Peng Liu
Xixin Wu
Shiyin Kang
Guangzhi Li
Dan Su
Dong Yu
46
16
0
30 Aug 2019
Blow: a single-scale hyperconditioned flow for non-parallel raw-audio
  voice conversion
Blow: a single-scale hyperconditioned flow for non-parallel raw-audio voice conversion
Joan Serrà
Santiago Pascual
Carlos Segura
CVBM
55
84
0
03 Jun 2019
Sliced Score Matching: A Scalable Approach to Density and Score
  Estimation
Sliced Score Matching: A Scalable Approach to Density and Score Estimation
Yang Song
Sahaj Garg
Jiaxin Shi
Stefano Ermon
78
409
0
17 May 2019
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
AUTOVC: Zero-Shot Voice Style Transfer with Only Autoencoder Loss
Kaizhi Qian
Yang Zhang
Shiyu Chang
Xuesong Yang
M. Hasegawa-Johnson
64
461
0
14 May 2019
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual
  Transfer Learning
End-to-end Text-to-speech for Low-resource Languages by Cross-Lingual Transfer Learning
Tao Tu
Yuan-Jui Chen
Cheng-chieh Yeh
Hung-yi Lee
41
87
0
13 Apr 2019
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
LibriTTS: A Corpus Derived from LibriSpeech for Text-to-Speech
Heiga Zen
Viet Dang
R. Clark
Yu Zhang
Ron J. Weiss
Ye Jia
Zhiwen Chen
Yonghui Wu
80
933
0
05 Apr 2019
An Attentive Survey of Attention Models
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
118
649
0
05 Apr 2019
Learning latent representations for style control and transfer in
  end-to-end speech synthesis
Learning latent representations for style control and transfer in end-to-end speech synthesis
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDL
SSL
DRL
46
228
0
11 Dec 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R. Prenger
Rafael Valle
Bryan Catanzaro
144
1,024
0
31 Oct 2018
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
J. Valin
Jan Skoglund
59
450
0
28 Oct 2018
Meta-Learning for Low-Resource Neural Machine Translation
Meta-Learning for Low-Resource Neural Machine Translation
Jiatao Gu
Yong Wang
Yun Chen
Kyunghyun Cho
Victor O.K. Li
74
342
0
25 Aug 2018
Glow: Generative Flow with Invertible 1x1 Convolutions
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma
Prafulla Dhariwal
BDL
DRL
240
3,110
0
09 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
249
826
0
12 Jun 2018
Machine Speech Chain with One-shot Speaker Adaptation
Machine Speech Chain with One-shot Speaker Adaptation
Andros Tjandra
S. Sakti
Satoshi Nakamura
58
55
0
28 Mar 2018
Demystifying MMD GANs
Demystifying MMD GANs
Mikolaj Binkowski
Danica J. Sutherland
Michael Arbel
Arthur Gretton
EGVM
102
1,478
0
04 Jan 2018
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Aaron van den Oord
Yazhe Li
Igor Babuschkin
Karen Simonyan
Oriol Vinyals
...
Alex Graves
Helen King
T. Walters
Dan Belov
Demis Hassabis
175
858
0
28 Nov 2017
Listening while Speaking: Speech Chain by Deep Learning
Listening while Speaking: Speech Chain by Deep Learning
Andros Tjandra
S. Sakti
Satoshi Nakamura
AuLLM
147
165
0
16 Jul 2017
Tacotron: Towards End-to-End Speech Synthesis
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
150
1,817
0
29 Mar 2017
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
SampleRNN: An Unconditional End-to-End Neural Audio Generation Model
Soroush Mehri
Kundan Kumar
Ishaan Gulrajani
Rithesh Kumar
Shubham Jain
Jose M. R. Sotelo
Aaron Courville
Yoshua Bengio
88
597
0
22 Dec 2016
WaveNet: A Generative Model for Raw Audio
WaveNet: A Generative Model for Raw Audio
Aaron van den Oord
Sander Dieleman
Heiga Zen
Karen Simonyan
Oriol Vinyals
Alex Graves
Nal Kalchbrenner
A. Senior
Koray Kavukcuoglu
DiffM
333
7,361
0
12 Sep 2016
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric
  Speech Synthesizers for Mobile Devices
Fast, Compact, and High Quality LSTM-RNN Based Statistical Parametric Speech Synthesizers for Mobile Devices
Heiga Zen
Yannis Agiomyrgiannakis
Niels Egberts
Fergus Henderson
Przemyslaw Szczepaniak
41
118
0
20 Jun 2016
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Perceptual Losses for Real-Time Style Transfer and Super-Resolution
Justin Johnson
Alexandre Alahi
Li Fei-Fei
SupR
201
10,202
0
27 Mar 2016
Pixel Recurrent Neural Networks
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
419
2,563
0
25 Jan 2016
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
174
7,683
0
31 Aug 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual
  Attention
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
298
10,034
0
10 Feb 2015
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
374
12,662
0
11 Dec 2014
Neural Machine Translation by Jointly Learning to Align and Translate
Neural Machine Translation by Jointly Learning to Align and Translate
Dzmitry Bahdanau
Kyunghyun Cho
Yoshua Bengio
AIMat
422
27,205
0
01 Sep 2014
Auto-Encoding Variational Bayes
Auto-Encoding Variational Bayes
Diederik P. Kingma
Max Welling
BDL
395
16,962
0
20 Dec 2013
12
Next