ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.07654
  4. Cited By
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan Ö. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
ArXivPDFHTML

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

28 / 78 papers shown
Title
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
33
56
0
04 Mar 2020
Semi-Supervised Neural Architecture Search
Semi-Supervised Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Enhong Chen
Tie-Yan Liu
13
88
0
24 Feb 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
16
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition
  Systems
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
18
83
0
19 Dec 2019
Vision-Infused Deep Audio Inpainting
Vision-Infused Deep Audio Inpainting
Hang Zhou
Ziwei Liu
Lingfeng Guo
Ping Luo
Dahua Lin
35
88
0
24 Oct 2019
High Fidelity Speech Synthesis with Adversarial Networks
High Fidelity Speech Synthesis with Adversarial Networks
Mikolaj Binkowski
Jeff Donahue
Sander Dieleman
Aidan Clark
Erich Elsen
Norman Casagrande
Luis C. Cobo
Karen Simonyan
243
239
0
25 Sep 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information
  Bottleneck
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
25
22
0
19 Aug 2019
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural
  Network with Multi-level Embedding Features
Polyphone Disambiguation for Mandarin Chinese Using Conditional Neural Network with Multi-level Embedding Features
Zexin Cai
Yaogen Yang
Chuxiong Zhang
Xiaoyi Qin
Ming Li
32
26
0
03 Jul 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven
  Dynamic Hierarchical Conditional Variational Network
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
24
75
0
17 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Yi Ren
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
44
101
0
13 May 2019
Probability density distillation with generative adversarial networks
  for high-quality parallel waveform generation
Probability density distillation with generative adversarial networks for high-quality parallel waveform generation
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
19
55
0
09 Apr 2019
Feature reinforcement with word embedding and parsing information in
  neural TTS
Feature reinforcement with word embedding and parsing information in neural TTS
Huaiping Ming
Lei He
Haohan Guo
Frank Soong
74
15
0
03 Jan 2019
FPETS : Fully Parallel End-to-End Text-to-Speech System
FPETS : Fully Parallel End-to-End Text-to-Speech System
Dabiao Ma
Zhiba Su
Wenxuan Wang
Yuhao Lu
24
6
0
12 Dec 2018
Activation Functions: Comparison of trends in Practice and Research for
  Deep Learning
Activation Functions: Comparison of trends in Practice and Research for Deep Learning
S. Bodenstedt
Dominik Rivoir
A. Gachagan
S. T. Mees
22
1,269
0
08 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
31
159
0
05 Nov 2018
Speaking style adaptation in Text-To-Speech synthesis using
  Sequence-to-sequence models with attention
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention
Bajibabu Bollepalli
Lauri Juvela
P. Alku
17
4
0
29 Oct 2018
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Jing-Xuan Zhang
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Lirong Dai
16
129
0
16 Oct 2018
Sample Efficient Adaptive Text-to-Speech
Sample Efficient Adaptive Text-to-Speech
Yutian Chen
Yannis Assael
Brendan Shillingford
David Budden
Scott E. Reed
...
Ben Laurie
Çağlar Gülçehre
Aaron van den Oord
Oriol Vinyals
Nando de Freitas
35
149
0
27 Sep 2018
Fast Spectrogram Inversion using Multi-head Convolutional Neural
  Networks
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
Sercan Ö. Arik
Heewoo Jun
G. Diamos
14
107
0
20 Aug 2018
Multi-task WaveNet: A Multi-task Generative Model for Statistical
  Parametric Speech Synthesis without Fundamental Frequency Conditions
Multi-task WaveNet: A Multi-task Generative Model for Statistical Parametric Speech Synthesis without Fundamental Frequency Conditions
Yu Gu
Yongguo Kang
12
17
0
22 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Z. Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
207
820
0
12 Jun 2018
Voice Imitating Text-to-Speech Neural Networks
Voice Imitating Text-to-Speech Neural Networks
Younggun Lee
Taesu Kim
Soo-Young Lee
29
11
0
04 Jun 2018
A Universal Music Translation Network
A Universal Music Translation Network
Noam Mor
Lior Wolf
Adam Polyak
Yaniv Taigman
22
110
0
21 May 2018
Collapsed speech segment detection and suppression for WaveNet vocoder
Collapsed speech segment detection and suppression for WaveNet vocoder
Yi-Chiao Wu
Kazuhiro Kobayashi
Tomoki Hayashi
Patrick Lumban Tobing
T. Toda
12
25
0
30 Apr 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
26
815
0
23 Mar 2018
Fitting New Speakers Based on a Short Untranscribed Sample
Fitting New Speakers Based on a Short Untranscribed Sample
Eliya Nachmani
Adam Polyak
Yaniv Taigman
Lior Wolf
24
84
0
20 Feb 2018
Adversarial Audio Synthesis
Adversarial Audio Synthesis
Chris Donahue
Julian McAuley
M. Puckette
GAN
45
604
0
12 Feb 2018
Previous
12