Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

24 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron"

50 / 139 papers shown

Title
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech Rongjie Huang Yi Ren Jinglin Liu Chenye Cui Zhou Zhao OODD VLM 117 34 0 15 May 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022 Jiameng Gao 30 0 0 08 Apr 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Jiankun Hu Zhiyong Wu Shiyin Kang Helen Meng 27 10 0 06 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios Yihan Wu Xu Tan Bohan Li Lei He Sheng Zhao Ruihua Song Tao Qin Tie-Yan Liu VLM DiffM 19 67 0 01 Apr 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis Shunwei Lei Yixuan Zhou Liyang Chen Zhiyong Wu Shiyin Kang Helen Meng 28 12 0 23 Mar 2022
SpeechPainter: Text-conditioned Speech Inpainting Zalan Borsos Matthew Sharifi Marco Tagliasacchi 16 26 0 15 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 70 18 0 24 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis Yu Wang Xinsheng Wang Pengcheng Zhu Jie Wu Hanzhao Li Heyang Xue Yongmao Zhang Lei Xie Mengxiao Bi 25 97 0 19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 27 73 0 17 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion Wendong Gan Bolong Wen Yin Yan Haitao Chen Zhichao Wang Hongqiang Du Lei Xie Kaixuan Guo Hai Li 15 14 0 02 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 32 11 0 23 Dec 2021
V2C: Visual Voice Cloning Qi Chen Yuanqing Li Yuankai Qi Jiaqiu Zhou Mingkui Tan Qi Wu VGen 33 24 0 25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis Alexandra Vioni Myrsini Christidou Nikolaos Ellinas G. Vamvoukakis Panos Kakoulidis Taehoon Kim June Sig Sung Hyoungmin Park Aimilios Chalamandaris Pirros Tsiakoulis 19 11 0 19 Nov 2021
Textless Speech Emotion Conversion using Discrete and Decomposed Representations Felix Kreuk Adam Polyak Jade Copet Eugene Kharitonov Tu Nguyen M. Rivière Wei-Ning Hsu Abdel-rahman Mohamed Emmanuel Dupoux Yossi Adi 25 30 0 14 Nov 2021
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 12 17 0 07 Nov 2021
Discrete Acoustic Space for an Efficient Sampling in Neural Text-To-Speech Mu Li Jonas Rohnke Antonio Bonafonte Mateusz Lajszczak Trevor Wood DRL 30 2 0 24 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 88 30 0 12 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 18 16 0 08 Oct 2021
Attention is All You Need? Good Embeddings with Statistics are enough:Large Scale Audio Understanding without Transformers/ Convolutions/ BERTs/ Mixers/ Attention/ RNNs or .... Prateek Verma AI4TS 32 2 0 07 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 37 22 0 06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints Ji-Hoon Kim Sang-Hoon Lee Ji-Hyun Lee Hong G Jung Seong-Whan Lee 47 6 0 16 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 22 22 0 27 Jul 2021
Generative Pretraining for Paraphrase Evaluation J. Weston R. Lenain U. Meepegama E. Fristed AIMat 27 10 0 17 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 32 15 0 17 Jul 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech Ammar Abbas Bajibabu Bollepalli Alexis Moinet Arnaud Joly Penny Karanasou Peter Makarov Simon Slangens S. Karlapati Thomas Drugman 21 0 0 29 Jun 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 353 0 29 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Dan Su 20 30 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 44 9 0 18 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 25 16 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 35 7 0 14 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling Jingbei Li Yi Meng Chenyi Li Zhiyong Wu Helen Meng Chao Weng Dan Su 31 24 0 11 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning Rayhane Mama Marc S. Tyndel Hashiam Kadhim Cole Clifford Ragavan Thurairatnam VGen 29 12 0 08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 25 160 0 06 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis Shakti Kumar Jithin Pradeep Hussain Zaidi DRL 41 6 0 10 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space Pol van Rijn Silvan Mertes Dominik Schiller Peter M. C. Harrison P. Larrouy-Maestri Elisabeth André Nori Jacoby 28 12 0 05 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts Yang Gao Tyler Vuong Mahsa Elyasi Gaurav Bharaj Rita Singh 26 20 0 08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis Xiang Li Changhe Song Jingbei Li Zhiyong Wu Jia Jia Helen Meng 22 47 0 08 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 50 81 0 28 Mar 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis Neeraj Kumar Srishti Goel Ankur Narang Brejesh Lall 29 5 0 14 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Lingwei Kong Jing Xiao 16 9 0 03 Dec 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 28 73 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 27 55 0 17 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 32 36 0 12 Nov 2020
Low-resource expressive text-to-speech using data augmentation Goeric Huybrechts Thomas Merritt Giulia Comini Bartek Perz Raahil Shah Jaime Lorenzo-Trueba 26 50 0 11 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 29 21 0 08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis Ron J. Weiss RJ Skerry-Ryan Eric Battenberg Soroosh Mariooryad Diederik P. Kingma 24 98 0 06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech S. Karlapati Ammar Abbas Zack Hodari Alexis Moinet Arnaud Joly Panagiota Karanasou Thomas Drugman 28 19 0 04 Nov 2020