Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

24 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron"

39 / 139 papers shown

Title
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 18 24 0 23 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 30 102 0 22 Oct 2020
Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction Daria Soboleva Ondrej Skopek Márius vSajgalík Victor Cuarbune Felix Weissenberger ... B. Prisacari Daniel Valcarce Justin Lu Rohit Prabhavalkar Balint Miklos 39 9 0 20 Oct 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features T. Raitio Ramya Rasipuram D. Castellani 42 66 0 14 Sep 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning Berrak Sisman Junichi Yamagishi Simon King Haizhou Li BDL 43 318 0 09 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 42 73 0 04 Aug 2020
Audiovisual Speech Synthesis using Tacotron2 Ahmed Hussen Abdelaziz Anushree Prasanna Kumar Chloe Seivwright Gabriele Fanelli Justin Binder Y. Stylianou S. Kajarekar 20 15 0 03 Aug 2020
Deep generative models for musical audio synthesis M. Huzaifah L. Wyse 27 20 0 10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer Mingjian Chen Xu Tan Yi Ren Jin Xu Hao Sun Sheng Zhao Tao Qin Tie-Yan Liu 26 109 0 08 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 54 478 0 22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 16 5 0 21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 37 66 0 18 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 23 119 0 12 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data Seung-won Park Doo-young Kim Myun-chul Joe 29 40 0 07 May 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Jing Xiao 24 21 0 04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 16 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 36 92 0 06 Feb 2020
Emotional Voice Conversion using Multitask Learning with Text-to-speech Tae-Ho Kim Sungjae Cho Shinkook Choi Sejik Park Soo-Young Lee 27 37 0 11 Nov 2019
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis Junjie Pan Xiang Yin Zhiling Zhang Shichao Liu Yang Zhang Zejun Ma Yuxuan Wang 14 26 0 11 Nov 2019
Teacher-Student Training for Robust Tacotron-based TTS Rui Liu Berrak Sisman Jingdong Li F. Bao Guanglai Gao Haizhou Li 19 38 0 07 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 25 148 0 26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency M. Whitehill Shuang Ma Daniel J. McDuff Yale Song 28 35 0 25 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit Tomoki Hayashi Ryuichi Yamamoto Katsuki Inoue Takenori Yoshimura Shinji Watanabe T. Toda K. Takeda Yu Zhang Xu Tan VLM 29 202 0 24 Oct 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis Eric Battenberg RJ Skerry-Ryan Soroosh Mariooryad Daisy Stanton David Kao Matt Shannon Tom Bagby 33 113 0 23 Oct 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities Slava Shechtman A. Sorin 22 33 0 23 Sep 2019
A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities Deepali Aneja Daniel J. McDuff S. Shah 14 38 0 19 Sep 2019
Maximizing Mutual Information for Tacotron Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Dan Su Dong Yu 22 16 0 30 Aug 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach Noé Tits 16 10 0 05 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training Peng Wu Zhenhua Ling Li-Juan Liu Yuan Jiang Hong-Chuan Wu Lirong Dai 10 72 0 26 Jun 2019
ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks Xiaoqi Jia Jianwei Tai Hang Zhou Yakai Li Weijuan Zhang Haichao Du Qingjia Huang GAN 22 6 0 27 May 2019
Building a mixed-lingual neural TTS system with only monolingual data Liumeng Xue Wei Song Guanghui Xu Lei Xie Zhizheng Wu 17 30 0 12 Apr 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis Yanyao Bian Changbin Chen Yongguo Kang Zhenglin Pan 18 46 0 04 Apr 2019
Style Transfer and Extraction for the Handwritten Letters Using Deep Learning Omar Mohammed Gérard Bailly D. Pellier 12 1 0 10 Dec 2018
Robust and fine-grained prosody control of end-to-end speech synthesis Younggun Lee Jonathan Le Roux 11 147 0 06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation Ye Jia Melvin Johnson Wolfgang Macherey Ron J. Weiss Yuan Cao Chung-Cheng Chiu Naveen Ari Stella Laurenzo Yonghui Wu 31 159 0 05 Nov 2018
Sample Efficient Adaptive Text-to-Speech Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott E. Reed ... Ben Laurie Çağlar Gülçehre Aaron van den Oord Oriol Vinyals Nando de Freitas 35 149 0 27 Sep 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis Daisy Stanton Yuxuan Wang RJ Skerry-Ryan 18 122 0 04 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Zhehuai Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 207 821 0 12 Jun 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 38 815 0 23 Mar 2018