Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

44 / 194 papers shown

Title
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis Fengyu Yang Shan Yang Qinghua Wu Yujun Wang Lei Xie 39 5 0 03 Aug 2020
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling Hao Hao Tan Dorien Herremans MGen 16 72 0 29 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis Antti Suni Sofoklis Kakouros M. Vainio J. Šimko 19 17 0 29 Jun 2020
Neural voice cloning with a few low-quality samples Sunghee Jung Hoi-Rim Kim 33 2 0 12 Jun 2020
Deep generative models for musical audio synthesis M. Huzaifah L. Wyse 27 20 0 10 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 54 477 0 22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices Sunghee Jung Hoi-Rim Kim 16 5 0 21 May 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis Yusuke Yasuda Xin Wang Junichi Yamagishi AI4TS 22 31 0 20 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 37 65 0 18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation A. Laptev Roman Korostik A. Svischev A. Andrusenko Ivan Medennikov S. Rybin 16 61 0 14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 21 119 0 12 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint Zexin Cai Chuxiong Zhang Ming Li 24 41 0 10 May 2020
Jukebox: A Generative Model for Music Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford Ilya Sutskever VLM 55 724 0 30 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis Ting-Yao Hu A. Shrivastava Oncel Tuzel C. Dhir 11 30 0 09 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Jing Xiao 24 21 0 04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 16 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 36 92 0 06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems Nick Rossenbach Albert Zeyer Ralf Schluter Hermann Ney 18 83 0 19 Dec 2019
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis Junjie Pan Xiang Yin Zhiling Zhang Shichao Liu Yang Zhang Zejun Ma Yuxuan Wang 9 26 0 11 Nov 2019
On Investigation of Unsupervised Speech Factorization Based on Normalization Flow Haoran Sun Yunqi Cai Lantian Li Dong Wang 21 1 0 29 Oct 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 25 148 0 26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency M. Whitehill Shuang Ma Daniel J. McDuff Yale Song 25 35 0 25 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit Tomoki Hayashi Ryuichi Yamamoto Katsuki Inoue Takenori Yoshimura Shinji Watanabe T. Toda K. Takeda Yu Zhang Xu Tan VLM 29 202 0 24 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training Qingyun Dou Yiting Lu Joshua Efiong Mark Gales 27 6 0 26 Sep 2019
Speech Recognition with Augmented Synthesized Speech Andrew Rosenberg Yu Zhang Bhuvana Ramabhadran Ye Jia Pedro J. Moreno Yonghui Wu Zelin Wu 32 127 0 25 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis Chengzhu Yu Heng Lu Na Hu Meng Yu Chao Weng ... Deyi Tuo Shiyin Kang Guangzhi Lei Dan Su Dong Yu CVBM 19 118 0 04 Sep 2019
Maximizing Mutual Information for Tacotron Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Dan Su Dong Yu 22 16 0 30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck Shuang Ma Daniel J. McDuff Yale Song 25 22 0 19 Aug 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach Noé Tits 16 10 0 05 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training Peng Wu Zhenhua Ling Li-Juan Liu Yuan Jiang Hong-Chuan Wu Lirong Dai 10 72 0 26 Jun 2019
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders Yin-Jyun Luo Kat R. Agres Dorien Herremans 22 46 0 19 Jun 2019
Using generative modelling to produce varied intonation for speech synthesis Zack Hodari O. Watts Simon King 29 29 0 10 Jun 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network V. Wan Chun-an Chan Tom Kenter Jakub Vít R. Clark 24 75 0 17 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition Yi Ren Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 44 101 0 13 May 2019
Incorporating Symbolic Sequential Modeling for Speech Enhancement Chien-Feng Liao Yu Tsao Xugang Lu Hisashi Kawai 27 18 0 30 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model Ye Jia Ron J. Weiss Fadi Biadsy Wolfgang Macherey Melvin Johnson Zhehuai Chen Yonghui Wu 21 223 0 12 Apr 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis Yanyao Bian Changbin Chen Yongguo Kang Zhenglin Pan 18 46 0 04 Apr 2019
Robust and fine-grained prosody control of end-to-end speech synthesis Younggun Lee Jonathan Le Roux 9 147 0 06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation Ye Jia Melvin Johnson Wolfgang Macherey Ron J. Weiss Yuan Cao Chung-Cheng Chiu Naveen Ari Stella Laurenzo Yonghui Wu 31 159 0 05 Nov 2018
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis Yu-An Chung Yuxuan Wang Wei-Ning Hsu Yu Zhang RJ Skerry-Ryan 24 117 0 30 Aug 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis Daisy Stanton Yuxuan Wang RJ Skerry-Ryan 18 122 0 04 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Zhehuai Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 207 820 0 12 Jun 2018
Conditional End-to-End Audio Transforms Albert Haque Michelle Guo Prateek Verma 33 41 0 30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 16 548 0 24 Mar 2018