Controllable speech synthesis by learning discrete phoneme-level prosodic representations

29 November 2022

Aimilios Chalamandaris

Pirros Tsiakoulis

P. Mastorocostas

ArXiv PDF HTML

Papers citing "Controllable speech synthesis by learning discrete phoneme-level prosodic representations"

21 / 21 papers shown

Title
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 37 11 0 19 Nov 2021
Word-Level Style Control for Expressive, Non-attentive Speech Synthesis Konstantinos Klapsas Nikolaos Ellinas June Sig Sung Hyoungmin Park S. Raptis 116 9 0 19 Nov 2021
Improved Prosodic Clustering for Multispeaker and Speaker-independent Phoneme-level Prosody Control Myrsini Christidou Alexandra Vioni Nikolaos Ellinas G. Vamvoukakis K. Markopoulos Panos Kakoulidis June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 38 4 0 19 Nov 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 113 31 0 12 Oct 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis Taejun Bak Jaesung Bae Hanbin Bae Young-Ik Kim Hoon-Young Cho 105 17 0 29 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS Liping Chen Yan Deng Xi Wang Frank Soong Lei He 60 23 0 08 Jun 2021
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 56 19 0 04 Nov 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features T. Raitio Ramya Rasipuram D. Castellani 56 66 0 14 Sep 2020
Bunched LPCNet : Vocoder for Low-cost Neural Text-To-Speech Systems Ravichander Vipperla Sangjun Park Kihyun Choo Samin S. Ishtiaq Kyoungbo Min S. Bhattacharya Abhinav Mehrotra Alberto Gil C. P. Ramos Nicholas D. Lane 58 26 0 11 Aug 2020
wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations Alexei Baevski Henry Zhou Abdel-rahman Mohamed Michael Auli SSL 280 5,790 0 20 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction Adrian Lañcucki 68 340 0 11 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 105 1,396 0 08 Jun 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 48 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 68 93 0 06 Feb 2020
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 64 149 0 26 Oct 2019
Learning to Speak Fluently in a Foreign Language: Multilingual Speech Synthesis and Cross-Language Voice Cloning Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu Zhiwen Chen RJ Skerry-Ryan Ye Jia Andrew Rosenberg Bhuvana Ramabhadran 45 188 0 09 Jul 2019
Fine-grained robust prosody transfer for single-speaker neural text-to-speech V. Klimkov S. Ronanki Jonas Rohnke Thomas Drugman AI4TS 64 82 0 04 Jul 2019
MelNet: A Generative Model for Audio in the Frequency Domain Sean Vasquez M. Lewis DiffM 56 131 0 04 Jun 2019
Learning latent representations for style control and transfer in end-to-end speech synthesis Ya-Jie Zhang Shifeng Pan Lei He Zhenhua Ling BDL SSL DRL 48 229 0 11 Dec 2018
Robust and fine-grained prosody control of end-to-end speech synthesis Younggun Lee Jonathan Le Roux 44 147 0 06 Nov 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Zhiwen Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 69 275 0 16 Oct 2018