Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018

Yuxuan Wang

Rif A. Saurous

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

50 / 195 papers shown

Title
Emotional Prosody Control for Speech Generation S. Sivaprasad Saiteja Kosgi Vineet Gandhi 12 17 0 07 Nov 2021
Zero-shot Voice Conversion via Self-supervised Prosody Representation Learning Shijun Wang Dimche Kostadinov Damian Borth 29 11 0 27 Oct 2021
DelightfulTTS: The Microsoft Speech Synthesis System for Blizzard Challenge 2021 Yanqing Liu Rui Shao G. Wang Kuan Chen Bohan Li Pong C. Yuen Jinzhu Li Lei He Sheng Zhao 39 55 0 25 Oct 2021
Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition Ting-Yao Hu Mohammadreza Armandpour A. Shrivastava Jen-Hao Rick Chang H. Koppula Oncel Tuzel SyDa 60 42 0 21 Oct 2021
ESPnet2-TTS: Extending the Edge of TTS Research Tomoki Hayashi Ryuichi Yamamoto Takenori Yoshimura Peter Wu Jiatong Shi Takaaki Saeki Yooncheol Ju Yusuke Yasuda Shinnosuke Takamichi Shinji Watanabe VLM 52 60 0 15 Oct 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 88 29 0 12 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech Pengfei Wu Junjie Pan Chenchang Xu Junhui Zhang Lin Wu Xiang Yin Zejun Ma 18 16 0 08 Oct 2021
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS T. Raitio Jiangchuan Li Shreyas Seshadri 34 22 0 06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models Jen-Hao Rick Chang A. Shrivastava H. Koppula Xiaoshuai Zhang Oncel Tuzel DiffM 51 16 0 06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks E. Hortal Rodrigo Brechard Alarcia GAN 26 2 0 06 Oct 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network Takaaki Saeki Shinnosuke Takamichi Hiroshi Saruwatari 34 3 0 22 Sep 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 19 22 0 27 Jul 2021
Generative Pretraining for Paraphrase Evaluation J. Weston R. Lenain U. Meepegama E. Fristed AIMat 27 10 0 17 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio J. Weston R. Lenain U. Meepegama E. Fristed SSL 32 15 0 17 Jul 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 353 0 29 Jun 2021
Speech is Silver, Silence is Golden: What do ASVspoof-trained Models Really Learn? Nicolas Müller Franziska Dieckmann Pavel Czempin Roman Canals Konstantin Böttinger Jennifer Williams 35 71 0 23 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control M. Kang Sungjae Kim Injung Kim 26 3 0 21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis Jian Cong Shan Yang Na Hu Guangzhi Li Lei Xie Dan Su 20 30 0 21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 42 9 0 18 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Chenye Cui Yi Ren Jinglin Liu Feiyang Chen Rongjie Huang Ming Lei Zhou Zhao 24 35 0 17 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis D. Mohan Qinmin Hu Tian Huey Teh Alexandra Torresquintero C. Wallis Marlene Staib Lorenzo Foglianti Jiameng Gao Simon King 25 16 0 15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system Panagiota Karanasou S. Karlapati Alexis Moinet Arnaud Joly Ammar Abbas Simon Slangen Jaime Lorenzo-Trueba Thomas Drugman 35 7 0 14 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling Jingbei Li Yi Meng Chenyi Li Zhiyong Wu Helen Meng Chao Weng Dan Su 31 24 0 11 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning Rayhane Mama Marc S. Tyndel Hashiam Kadhim Cole Clifford Ragavan Thurairatnam VGen 29 12 0 08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation Dong Min Dong Bok Lee Eunho Yang Sung Ju Hwang 25 160 0 06 Jun 2021
Learning Robust Latent Representations for Controllable Speech Synthesis Shakti Kumar Jithin Pradeep Hussain Zaidi DRL 41 6 0 10 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space Pol van Rijn Silvan Mertes Dominik Schiller Peter M. C. Harrison P. Larrouy-Maestri Elisabeth André Nori Jacoby 28 12 0 05 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis Yixuan Zhou Changhe Song Jingbei Li Zhiyong Wu Yanyao Bian Dan Su Helen Meng 41 6 0 14 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures Nick Rossenbach Mohammad Zeineldeen Benedikt Hilmes Ralf Schluter Hermann Ney 36 12 0 12 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis Xiang Li Changhe Song Jingbei Li Zhiyong Wu Jia Jia Helen Meng 19 47 0 08 Apr 2021
Attention Forcing for Machine Translation Qingyun Dou Yiting Lu Potsawee Manakul Xixin Wu Mark Gales 31 7 0 02 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS Ye Jia Heiga Zen Jonathan Shen Yu Zhang Yonghui Wu SSL 47 81 0 28 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech C. Chien Jheng-hao Lin Chien-yu Huang Po-Chun Hsu Hung-yi Lee 27 68 0 06 Mar 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis Neeraj Kumar Srishti Goel Ankur Narang Brejesh Lall 29 5 0 14 Dec 2020
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis Anurag Chowdhury Arun Ross Prabu David 16 5 0 09 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis Aolan Sun Jianzong Wang Ning Cheng Huayi Peng Zhen Zeng Lingwei Kong Jing Xiao 16 9 0 03 Dec 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 28 73 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 27 55 0 17 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis C. Chien Hung-yi Lee 32 36 0 12 Nov 2020
Spoken Language Interaction with Robots: Research Issues and Recommendations, Report from the NSF Future Directions Workshop M. Marge C. Espy-Wilson Roger K. Moore 26 78 0 11 Nov 2020
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model Haoyu Li Yang Ai Junichi Yamagishi 17 2 0 10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 29 21 0 08 Nov 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition Xiong Cai Dongyang Dai Zhiyong Wu Xiang Li Jingbei Li Helen Meng 14 66 0 26 Oct 2020
Unsupervised Learning of Disentangled Speech Content and Style Representation Andros Tjandra Ruoming Pang Yu Zhang Shigeki Karita BDL DRL 23 15 0 24 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines Yao Shi Hui Bu Xin Xu Shaojing Zhang Ming Li 35 219 0 22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS Isaac Elias Heiga Zen Jonathan Shen Yu Zhang Ye Jia Ron J. Weiss Yonghui Wu DRL 30 102 0 22 Oct 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features T. Raitio Ramya Rasipuram D. Castellani 42 66 0 14 Sep 2020
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling Yeunju Choi Youngmoon Jung Hoirin Kim 19 26 0 09 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 39 73 0 04 Aug 2020