Emotional Prosody Control for Speech Generation

7 November 2021

Papers citing "Emotional Prosody Control for Speech Generation"

20 / 20 papers shown

Title
EmoSphere++: Emotion-Controllable Zero-Shot Text-to-Speech via Emotion-Adaptive Spherical Vector Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Seong-Whan Lee 106 8 0 04 Nov 2024
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 105 1,396 0 08 Jun 2020
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech S. Karlapati Alexis Moinet Arnaud Joly V. Klimkov Daniel Sáez-Trigueros Thomas Drugman 36 67 0 30 Apr 2020
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis Kundan Kumar Rithesh Kumar T. Boissière L. Gestin Wei Zhen Teoh Jose M. R. Sotelo A. D. Brébisson Yoshua Bengio Aaron Courville GAN 159 953 0 08 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis Raza Habib Soroosh Mariooryad Matt Shannon Eric Battenberg RJ Skerry-Ryan Daisy Stanton David Kao Tom Bagby BDL 42 48 0 03 Oct 2019
Effective Use of Variational Embedding Capacity in Expressive End-to-End Speech Synthesis Eric Battenberg Soroosh Mariooryad Daisy Stanton RJ Skerry-Ryan Matt Shannon David Kao Tom Bagby BDL 44 45 0 08 Jun 2019
Token-Level Ensemble Distillation for Grapheme-to-Phoneme Conversion Hao Sun Xu Tan Jun-Wei Gan Hongzhi Liu Sheng Zhao Tao Qin Tie-Yan Liu 47 66 0 06 Apr 2019
ClariNet: Parallel Wave Generation in End-to-End Text-to-Speech Ming-Yu Liu Kainan Peng Jitong Chen 53 346 0 19 Jul 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Zhiwen Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 254 830 0 12 Jun 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 54 554 0 24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 64 826 0 23 Mar 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions Jonathan Shen Ruoming Pang Ron J. Weiss M. Schuster Navdeep Jaitly ... Yuxuan Wang RJ Skerry-Ryan Rif A. Saurous Yannis Agiomyrgiannakis Yonghui Wu 77 2,697 0 16 Dec 2017
Parallel WaveNet: Fast High-Fidelity Speech Synthesis Aaron van den Oord Yazhe Li Igor Babuschkin Karen Simonyan Oriol Vinyals ... Alex Graves Helen King T. Walters Dan Belov Demis Hassabis 210 858 0 28 Nov 2017
Generalized End-to-End Loss for Speaker Verification Li Wan Quan Wang Alan Papir Ignacio López Moreno VLM 68 926 0 28 Oct 2017
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning Ming-Yu Liu Kainan Peng Andrew Gibiansky Sercan O. Arik Ajay Kannan Sharan Narang Jonathan Raiman John Miller 66 307 0 20 Oct 2017
VoxCeleb: a large-scale speaker identification dataset Arsha Nagrani Joon Son Chung Andrew Zisserman 122 2,273 0 26 Jun 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech Sercan O. Arik G. Diamos Andrew Gibiansky John Miller Kainan Peng Ming-Yu Liu Jonathan Raiman Yanqi Zhou 70 496 0 24 May 2017
Tacotron: Towards End-to-End Speech Synthesis Yuxuan Wang RJ Skerry-Ryan Daisy Stanton Yonghui Wu Ron J. Weiss ... Samy Bengio Quoc V. Le Yannis Agiomyrgiannakis R. Clark Rif A. Saurous 155 1,823 0 29 Mar 2017
Deep Voice: Real-time Neural Text-to-Speech Sercan O. Arik Mike Chrzanowski Adam Coates G. Diamos Andrew Gibiansky ... John Miller Andrew Ng Jonathan Raiman Shubho Sengupta Mohammad Shoeybi 80 616 0 25 Feb 2017
WaveNet: A Generative Model for Raw Audio Aaron van den Oord Sander Dieleman Heiga Zen Karen Simonyan Oriol Vinyals Alex Graves Nal Kalchbrenner A. Senior Koray Kavukcuoglu DiffM 404 7,391 0 12 Sep 2016