Disentangling Style and Speaker Attributes for TTS Style Transfer

24 January 2022

Lei Xie

Papers citing "Disentangling Style and Speaker Attributes for TTS Style Transfer"

50 / 54 papers shown

Title
Generative Adversarial Networks Gilad Cohen Raja Giryes GAN 217 30,089 0 01 Mar 2022
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis Shifeng Pan Lei He 49 23 0 27 Jul 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 80 9 0 18 Jun 2021
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification Li Zhang Qing Wang Kong Aik Lee Lei Xie Haizhou Li 47 13 0 17 Jun 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis Tao Li Shan Yang Liumeng Xue Lei Xie 45 74 0 17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis Yinjiao Lei Shan Yang Lei Xie 52 56 0 17 Nov 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer Pengcheng Guo Florian Boyer Xuankai Chang Tomoki Hayashi Yosuke Higuchi ... Jing Shi Shinji Watanabe Kun Wei Wangyou Zhang Yuekai Zhang 57 263 0 26 Oct 2020
Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020 Hee-Soo Heo Bong-Jin Lee Jaesung Huh Joon Son Chung 30 133 0 29 Sep 2020
Expressive TTS Training with Frame and Style Reconstruction Loss Rui Liu Berrak Sisman Guanglai Gao Haizhou Li 69 73 0 04 Aug 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 40 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 62 92 0 06 Feb 2020
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech Vatsal Aggarwal Marius Cotescu N. Prateek Jaime Lorenzo-Trueba Roberto Barra-Chicote 38 30 0 28 Nov 2019
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features Francesco Ferroni Kilol Gupta D. Shah Z. Shakeri Jervis Pinto 37 15 0 21 Nov 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency M. Whitehill Shuang Ma Daniel J. McDuff Yale Song 45 35 0 25 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 37 48 0 03 Oct 2019
Universal audio synthesizer control with normalizing flows P. Esling Naotake Masuda Adrien Bardet R. Despres Axel Chemla-Romeu-Santos 46 45 0 01 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training Peng Wu Zhenhua Ling Li-Juan Liu Yuan Jiang Hong-Chuan Wu Lirong Dai 34 72 0 26 Jun 2019
A New GAN-based End-to-End TTS Training Algorithm Haohan Guo Frank Soong Lei He Lei Xie 47 47 0 09 Apr 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis Yanyao Bian Changbin Chen Yongguo Kang Zhenglin Pan 35 46 0 04 Apr 2019
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet J. Valin Jan Skoglund 51 79 0 28 Mar 2019
Learning latent representations for style control and transfer in end-to-end speech synthesis Ya-Jie Zhang Shifeng Pan Lei He Zhenhua Ling BDL SSL DRL 48 228 0 11 Dec 2018
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction J. Valin Jan Skoglund 62 451 0 28 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis Wei-Ning Hsu Yu Zhang Ron J. Weiss Heiga Zen Yonghui Wu ... Ye Jia Zhiwen Chen Jonathan Shen Patrick Nguyen Ruoming Pang BDL 60 275 0 16 Oct 2018
The Deep Weight Prior Andrei Atanov Arsenii Ashukha Kirill Struminsky Dmitry Vetrov Max Welling BDL 44 37 0 16 Oct 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis Daisy Stanton Yuxuan Wang RJ Skerry-Ryan 51 122 0 04 Aug 2018
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 348 2,274 0 14 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Zhiwen Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 251 828 0 12 Jun 2018
Self-Attention Generative Adversarial Networks Han Zhang Ian Goodfellow Dimitris N. Metaxas Augustus Odena GAN 129 3,720 0 21 May 2018
Understanding disentangling in $β$ -VAE Christopher P. Burgess I. Higgins Arka Pal Loic Matthey Nicholas Watters Guillaume Desjardins Alexander Lerchner CoGe DRL 57 829 0 10 Apr 2018
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder K. Akuzawa Yusuke Iwasawa Y. Matsuo 35 139 0 06 Apr 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 54 554 0 24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 64 825 0 23 Mar 2018
Sylvester Normalizing Flows for Variational Inference Rianne van den Berg Leonard Hasenclever Jakub M. Tomczak Max Welling BDL DRL 58 252 0 15 Mar 2018
Efficient Neural Audio Synthesis Nal Kalchbrenner Erich Elsen Karen Simonyan Seb Noury Norman Casagrande Edward Lockhart Florian Stimberg Aaron van den Oord Sander Dieleman Koray Kavukcuoglu 87 867 0 23 Feb 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 77 2,694 0 16 Dec 2017
Parallel WaveNet: Fast High-Fidelity Speech Synthesis Aaron van den Oord Yazhe Li Igor Babuschkin Karen Simonyan Oriol Vinyals ... Alex Graves Helen King T. Walters Dan Belov Demis Hassabis 181 858 0 28 Nov 2017
Generalized End-to-End Loss for Speaker Verification Li Wan Quan Wang Alan Papir Ignacio López Moreno VLM 66 924 0 28 Oct 2017
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning Ming-Yu Liu Kainan Peng Andrew Gibiansky Sercan O. Arik Ajay Kannan Sharan Narang Jonathan Raiman John Miller 63 307 0 20 Oct 2017
Attention Is All You Need Ashish Vaswani Noam M. Shazeer Niki Parmar Jakob Uszkoreit Llion Jones Aidan Gomez Lukasz Kaiser Illia Polosukhin 3DV 591 130,942 0 12 Jun 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech Sercan O. Arik G. Diamos Andrew Gibiansky John Miller Kainan Peng Ming-Yu Liu Jonathan Raiman Yanqi Zhou 70 495 0 24 May 2017
Learning Latent Representations for Speech Generation and Transformation Wei-Ning Hsu Yu Zhang James R. Glass DRL BDL SSL 50 145 0 13 Apr 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 153 1,819 0 29 Mar 2017
Disentangling factors of variation in deep representations using adversarial training Michaël Mathieu Jiaqi Zhao Pablo Sprechmann Aditya A. Ramesh Yann LeCun DRL CML 89 490 0 10 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation Yonghui Wu M. Schuster Zhiwen Chen Quoc V. Le Mohammad Norouzi ... Alex Rudnick Oriol Vinyals G. Corrado Macduff Hughes J. Dean AIMat 847 6,781 0 26 Sep 2016
Improving Variational Inference with Inverse Autoregressive Flow Diederik P. Kingma Tim Salimans Rafal Jozefowicz Xi Chen Ilya Sutskever Max Welling BDL DRL 105 1,816 0 15 Jun 2016
Generating Sentences from a Continuous Space Samuel R. Bowman Luke Vilnis Oriol Vinyals Andrew M. Dai Rafal Jozefowicz Samy Bengio DRL 98 2,358 0 19 Nov 2015
Attention-Based Models for Speech Recognition J. Chorowski Dzmitry Bahdanau Dmitriy Serdyuk Kyunghyun Cho Yoshua Bengio 115 2,606 0 24 Jun 2015
Variational Inference with Normalizing Flows Danilo Jimenez Rezende S. Mohamed DRL BDL 284 4,167 0 21 May 2015
MADE: Masked Autoencoder for Distribution Estimation M. Germain Karol Gregor Iain Murray Hugo Larochelle OOD SyDa UQCV 146 867 0 12 Feb 2015
Grammar as a Foreign Language Oriol Vinyals Lukasz Kaiser Terry Koo Slav Petrov Ilya Sutskever Geoffrey E. Hinton 97 930 0 23 Dec 2014