Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis

1 November 2022

Aimilios Chalamandaris

Pirros Tsiakoulis

ArXiv PDF HTML

Papers citing "Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis"

25 / 25 papers shown

Title
Hierarchical and Multi-Scale Variational Autoencoder for Diverse and Natural Non-Autoregressive Text-to-Speech Jaesung Bae Jinhyeok Yang Taejun Bak Young-Sun Joo DiffM 106 6 0 08 Apr 2022
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 37 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 116 9 0 19 Nov 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 64 22 0 06 Oct 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech Jaehyeon Kim Jungil Kong Juhee Son DRL 114 882 0 11 Jun 2021
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 60 103 0 22 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling Jonathan Shen Ye Jia Mike Chrzanowski Yu Zhang Isaac Elias Heiga Zen Yonghui Wu 54 112 0 08 Oct 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 105 1,396 0 08 Jun 2020
End-to-End Adversarial Text-to-Speech Jeff Donahue Sander Dieleman Mikolaj Binkowski Erich Elsen Karen Simonyan 66 186 0 05 Jun 2020
Decision-Making with Auto-Encoding Variational Bayes Romain Lopez Pierre Boyeau Nir Yosef Michael I. Jordan Jeffrey Regier BDL 380 10,591 0 17 Feb 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 48 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 68 93 0 06 Feb 2020
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities Slava Shechtman A. Sorin 47 33 0 23 Sep 2019
Learning latent representations for style control and transfer in end-to-end speech synthesis Ya-Jie Zhang Shifeng Pan Lei He Zhenhua Ling BDL SSL DRL 48 229 0 11 Dec 2018
Robust and fine-grained prosody control of end-to-end speech synthesis Younggun Lee Jonathan Le Roux 44 147 0 06 Nov 2018
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction J. Valin Jan Skoglund 65 451 0 28 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Zhiwen Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 66 275 0 16 Oct 2018
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 350 2,275 0 14 Jun 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 54 554 0 24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 64 826 0 23 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 77 2,697 0 16 Dec 2017
Neural Discrete Representation Learning Aaron van den Oord Oriol Vinyals Koray Kavukcuoglu BDL SSL OCL 222 5,004 0 02 Nov 2017
Generalized End-to-End Loss for Speaker Verification Li Wan Quan Wang Alan Papir Ignacio López Moreno VLM 66 926 0 28 Oct 2017
Attention-Based Models for Speech Recognition J. Chorowski Dzmitry Bahdanau Dmitriy Serdyuk Kyunghyun Cho Yoshua Bengio 121 2,606 0 24 Jun 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.7K 150,006 0 22 Dec 2014