Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017

Sharan Narang

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

28 / 78 papers shown

Title
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment Zhen Zeng Jianzong Wang Ning Cheng Tian Xia Jing Xiao VLM 33 56 0 04 Mar 2020
Semi-Supervised Neural Architecture Search Renqian Luo Xu Tan Rui Wang Tao Qin Enhong Chen Tie-Yan Liu 13 88 0 24 Feb 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis Guangzhi Sun Yu Zhang Ron J. Weiss Yuanbin Cao Heiga Zen Yonghui Wu 16 130 0 06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior Guangzhi Sun Yu Zhang Ron J. Weiss Yuan Cao Heiga Zen Andrew Rosenberg Bhuvana Ramabhadran Yonghui Wu DiffM 36 92 0 06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems Nick Rossenbach Albert Zeyer Ralf Schluter Hermann Ney 18 83 0 19 Dec 2019
Vision-Infused Deep Audio Inpainting Hang Zhou Ziwei Liu Lingfeng Guo Ping Luo Dahua Lin 35 88 0 24 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks Mikolaj Binkowski Jeff Donahue Sander Dieleman Aidan Clark Erich Elsen Norman Casagrande Luis C. Cobo Karen Simonyan 243 239 0 25 Sep 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck Shuang Ma Daniel J. McDuff Yale Song 25 22 0 19 Aug 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features Zexin Cai Yaogen Yang Chuxiong Zhang Xiaoyi Qin Ming Li 32 26 0 03 Jul 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network V. Wan Chun-an Chan Tom Kenter Jakub Vít R. Clark 24 75 0 17 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition Yi Ren Xu Tan Tao Qin Sheng Zhao Zhou Zhao Tie-Yan Liu 44 101 0 13 May 2019
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation Ryuichi Yamamoto Eunwoo Song Jae-Min Kim 19 55 0 09 Apr 2019
Feature reinforcement with word embedding and parsing information in neural TTS Huaiping Ming Lei He Haohan Guo Frank Soong 74 15 0 03 Jan 2019
FPETS : Fully Parallel End-to-End Text-to-Speech System Dabiao Ma Zhiba Su Wenxuan Wang Yuhao Lu 24 6 0 12 Dec 2018
Activation Functions: Comparison of trends in Practice and Research for Deep Learning S. Bodenstedt Dominik Rivoir A. Gachagan S. T. Mees 22 1,269 0 08 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation Ye Jia Melvin Johnson Wolfgang Macherey Ron J. Weiss Yuan Cao Chung-Cheng Chiu Naveen Ari Stella Laurenzo Yonghui Wu 31 159 0 05 Nov 2018
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention Bajibabu Bollepalli Lauri Juvela P. Alku 17 4 0 29 Oct 2018
Sequence-to-Sequence Acoustic Modeling for Voice Conversion Jing-Xuan Zhang Zhenhua Ling Li-Juan Liu Yuan Jiang Lirong Dai 16 129 0 16 Oct 2018
Sample Efficient Adaptive Text-to-Speech Yutian Chen Yannis Assael Brendan Shillingford David Budden Scott E. Reed ... Ben Laurie Çağlar Gülçehre Aaron van den Oord Oriol Vinyals Nando de Freitas 35 149 0 27 Sep 2018
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks Sercan Ö. Arik Heewoo Jun G. Diamos 14 107 0 20 Aug 2018
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions Yu Gu Yongguo Kang 12 17 0 22 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis Ye Jia Yu Zhang Ron J. Weiss Quan Wang Jonathan Shen ... Z. Chen Patrick Nguyen Ruoming Pang Ignacio López Moreno Yonghui Wu 207 820 0 12 Jun 2018
Voice Imitating Text-to-Speech Neural Networks Younggun Lee Taesu Kim Soo-Young Lee 29 11 0 04 Jun 2018
A Universal Music Translation Network Noam Mor Lior Wolf Adam Polyak Yaniv Taigman 22 110 0 21 May 2018
Collapsed speech segment detection and suppression for WaveNet vocoder Yi-Chiao Wu Kazuhiro Kobayashi Tomoki Hayashi Patrick Lumban Tobing T. Toda 12 25 0 30 Apr 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis Yuxuan Wang Daisy Stanton Yu Zhang RJ Skerry-Ryan Eric Battenberg Joel Shor Y. Xiao Fei Ren Ye Jia Rif A. Saurous 26 815 0 23 Mar 2018
Fitting New Speakers Based on a Short Untranscribed Sample Eliya Nachmani Adam Polyak Yaniv Taigman Lior Wolf 24 84 0 20 Feb 2018
Adversarial Audio Synthesis Chris Donahue Julian McAuley M. Puckette GAN 45 604 0 12 Feb 2018