Improving Prosody Modelling with Cross-Utterance BERT Embeddings for
End-to-end Speech Synthesis

Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis

6 November 2020

Zhengchen Zhang

Chao Zhang

Papers citing "Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis"

18 / 18 papers shown

Title
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 51 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 76 93 0 06 Feb 2020
Towards Transfer Learning for End-to-End Speech Synthesis from Deep Pre-Trained Language Models Wei Fang Yu-An Chung James R. Glass 51 27 0 17 Jun 2019
Learning latent representations for style control and transfer in end-to-end speech synthesis Ya-Jie Zhang Shifeng Pan Lei He Zhenhua Ling BDL SSL DRL 53 229 0 11 Dec 2018
Robust and fine-grained prosody control of end-to-end speech synthesis Younggun Lee Jonathan Le Roux 51 147 0 06 Nov 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Zhiwen Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 72 275 0 16 Oct 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 94,891 0 11 Oct 2018
Deep Encoder-Decoder Models for Unsupervised Learning of Controllable Speech Synthesis G. Henter Jaime Lorenzo-Trueba Xin Wang Junichi Yamagishi DRL SSL 59 61 0 30 Jul 2018
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder K. Akuzawa Yusuke Iwasawa Y. Matsuo 48 139 0 06 Apr 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 54 554 0 24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 66 826 0 23 Mar 2018
Efficient Neural Audio Synthesis Nal Kalchbrenner Erich Elsen Karen Simonyan Seb Noury Norman Casagrande Edward Lockhart Florian Stimberg Aaron van den Oord Sander Dieleman Koray Kavukcuoglu 89 867 0 23 Feb 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 79 2,698 0 16 Dec 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 701 131,652 0 12 Jun 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 160 1,825 0 29 Mar 2017
Sequence to Sequence Learning with Neural Networks Ilya Sutskever Oriol Vinyals Quoc V. Le AIMat 437 20,568 0 10 Sep 2014
Neural Machine Translation by Jointly Learning to Align and Translate Dzmitry Bahdanau Kyunghyun Cho Yoshua Bengio AIMat 558 27,311 0 01 Sep 2014
Auto-Encoding Variational Bayes Diederik P. Kingma Max Welling BDL 452 16,929 0 20 Dec 2013