Deep Voice 2: Multi-Speaker Neural Text-to-Speech

24 May 2017

Papers citing "Deep Voice 2: Multi-Speaker Neural Text-to-Speech"

37 / 87 papers shown

Title
Neural voice cloning with a few low-quality samples Sunghee Jung Hoi-Rim Kim 33 2 0 12 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer Mingjian Chen Xu Tan Yi Ren Jin Xu Hao Sun Sheng Zhao Tao Qin Tie-Yan Liu 21 109 0 08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech Yi Ren Chenxu Hu Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 60 1,357 0 08 Jun 2020
Contrastive Predictive Coding Supported Factorized Variational Autoencoder for Unsupervised Learning of Disentangled Speech Representations Janek Ebbers Michael Kuhlmann Tobias Cord-Landwehr Reinhold Haeb-Umbach DRL CoGe SSL 31 4 0 26 May 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search Jaehyeon Kim Sungwon Kim Jungil Kong Sungroh Yoon 54 475 0 22 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding Seungwoo Choi Seungju Han Dongyoung Kim S. Ha 32 65 0 18 May 2020
Many-to-Many Voice Transformer Network Hirokazu Kameoka Wen-Chin Huang Kou Tanaka Takuhiro Kaneko Nobukatsu Hojo T. Toda ViT 30 30 0 18 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis Rafael Valle Kevin J. Shih R. Prenger Bryan Catanzaro 21 119 0 12 May 2020
Jukebox: A Generative Model for Music Prafulla Dhariwal Heewoo Jun Christine Payne Jong Wook Kim Alec Radford Ilya Sutskever VLM 28 722 0 30 Apr 2020
Direct Speech-to-image Translation Jiguo Li Xinfeng Zhang Chuanmin Jia Jizheng Xu Li Zhang Y. Wang Siwei Ma Wen Gao 36 29 0 07 Apr 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment Zhen Zeng Jianzong Wang Ning Cheng Tian Xia Jing Xiao VLM 25 56 0 04 Mar 2020
Semi-Supervised Neural Architecture Search Renqian Luo Xu Tan Rui Wang Tao Qin Enhong Chen Tie-Yan Liu 13 88 0 24 Feb 2020
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens Rafael Valle Jason Chun Lok Li R. Prenger Bryan Catanzaro 14 148 0 26 Oct 2019
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 35 88 0 24 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 235 239 0 25 Sep 2019
Maximizing Mutual Information for Tacotron Peng Liu Xixin Wu Shiyin Kang Guangzhi Li Dan Su Dong Yu 22 16 0 30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck Shuang Ma Daniel J. McDuff Yale Song 25 22 0 19 Aug 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features Zexin Cai Yaogen Yang Chuxiong Zhang Xiaoyi Qin Ming Li 27 26 0 03 Jul 2019
Non-Autoregressive Neural Text-to-Speech Kainan Peng Ming-Yu Liu Z. Song Kexin Zhao 29 39 0 21 May 2019
Adversarially Trained Autoencoders for Parallel-Data-Free Voice Conversion Orhan Ocal Oguz H. Elibol Gokce Keskin Cory Stephenson Anil Thomas Kannan Ramchandran 26 10 0 09 May 2019
TTS Skins: Speaker Conversion via ASR Adam Polyak Lior Wolf Yaniv Taigman 18 27 0 18 Apr 2019
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 19 55 0 09 Apr 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis Yanyao Bian Changbin Chen Yongguo Kang Zhenglin Pan 15 46 0 04 Apr 2019
Data Efficient Voice Cloning for Neural Singing Synthesis Merlijn Blaauw J. Bonada R. Daido 22 33 0 19 Feb 2019
Securing Voice-driven Interfaces against Fake (Cloned) Audio Attacks Hafiz Malik 13 26 0 18 Feb 2019
Learning pronunciation from a foreign language in speech synthesis networks Younggun Lee Suwon Shon Taesu Kim 20 26 0 23 Nov 2018
Sample Efficient Adaptive Text-to-Speech Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott E. Reed ... Ben Laurie Çağlar Gülçehre Aaron van den Oord Oriol Vinyals Nando de Freitas 35 149 0 27 Sep 2018
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks Sercan Ö. Arik Heewoo Jun G. Diamos 14 106 0 20 Aug 2018
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions Yu Gu Yongguo Kang 10 17 0 22 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Z. Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 207 820 0 12 Jun 2018
Voice Imitating Text-to-Speech Neural Networks Younggun Lee Taesu Kim Soo-Young Lee 26 11 0 04 Jun 2018
Collapsed speech segment detection and suppression for WaveNet vocoder Yi-Chiao Wu Kazuhiro Kobayashi Tomoki Hayashi Patrick Lumban Tobing T. Toda 7 25 0 30 Apr 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron RJ Skerry-Ryan Eric Battenberg Y. Xiao Yuxuan Wang Daisy Stanton Joel Shor Ron J. Weiss R. Clark Rif A. Saurous 14 547 0 24 Mar 2018
Do WaveNets Dream of Acoustic Waves? Kanru Hua 24 1 0 23 Feb 2018
Fitting New Speakers Based on a Short Untranscribed Sample Eliya Nachmani Adam Polyak Yaniv Taigman Lior Wolf 21 84 0 20 Feb 2018
Adversarial Audio Synthesis Chris Donahue Julian McAuley M. Puckette GAN 39 602 0 12 Feb 2018
NSML: A Machine Learning Platform That Enables You to Focus on Your Models Nako Sung Minkyu Kim Hyunwoo Jo Youngil Yang Jingwoong Kim ... Youngkwan Kim Gayoung Lee Donghyun Kwak Jung-Woo Ha Sunghun Kim 38 86 0 16 Dec 2017