ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1710.07654
  4. Cited By
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning

Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning

20 October 2017
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan Ö. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
ArXivPDFHTML

Papers citing "Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning"

50 / 74 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
20
0
0
14 May 2025
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
23
0
0
27 Oct 2023
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph
  Reading
ContextSpeech: Expressive and Efficient Text-to-Speech for Paragraph Reading
Yujia Xiao
Shaofei Zhang
Xi Wang
Xuejiao Tan
Lei He
Sheng Zhao
Frank Soong
Tan Lee
25
5
0
03 Jul 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Towards Building Text-To-Speech Systems for the Next Billion Users
Towards Building Text-To-Speech Systems for the Next Billion Users
Gokul Karthik Kumar
V. PraveenS.
Pratyush Kumar
Mitesh M. Khapra
Karthik Nandakumar
38
18
0
17 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
41
18
0
01 Nov 2022
Text-to-speech synthesis from dark data with evaluation-in-the-loop data
  selection
Text-to-speech synthesis from dark data with evaluation-in-the-loop data selection
Kentaro Seki
Shinnosuke Takamichi
Takaaki Saeki
Hiroshi Saruwatari
23
6
0
26 Oct 2022
The Sound of Silence: Efficiency of First Digit Features in Synthetic
  Audio Detection
The Sound of Silence: Efficiency of First Digit Features in Synthetic Audio Detection
Daniele Mari
Federica Latora
Simone Milani
13
11
0
06 Oct 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
14
44
0
11 Aug 2022
Controllable Data Generation by Deep Learning: A Review
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
33
28
0
19 Jul 2022
Show Me Your Face, And I'll Tell You How You Speak
Show Me Your Face, And I'll Tell You How You Speak
Christen Millerdurai
L. A. Khaliq
Timon Ulrich
CVBM
68
0
0
28 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
213
0
09 May 2022
Heterogeneous Target Speech Separation
Heterogeneous Target Speech Separation
Hyunjae Cho
Wonbin Jung
Junhyeok Lee
Paris Smaragdis
Sanghyun Woo
46
26
0
07 Apr 2022
Self-supervised learning for robust voice cloning
Self-supervised learning for robust voice cloning
Konstantinos Klapsas
Nikolaos Ellinas
Karolos Nikitaras
G. Vamvoukakis
Panos Kakoulidis
...
S. Raptis
June Sig Sung
Gunu Jho
Aimilios Chalamandaris
Pirros Tsiakoulis
SSL
27
6
0
07 Apr 2022
Residual-guided Personalized Speech Synthesis based on Face Image
Residual-guided Personalized Speech Synthesis based on Face Image
Jianrong Wang
Zixuan Wang
Xiaosheng Hu
Xuewei Li
Qiang Fang
Li Liu
CVBM
24
16
0
01 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLM
DiffM
14
66
0
01 Apr 2022
Real time spectrogram inversion on mobile phone
Real time spectrogram inversion on mobile phone
Oleg Rybakov
Marco Tagliasacchi
Yunpeng Li
Liyang Jiang
Xia Zhang
Fadi Biadsy
21
4
0
01 Mar 2022
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric
  Speech Recognition
Synthesizing Dysarthric Speech Using Multi-talker TTS for Dysarthric Speech Recognition
M. Soleymanpour
Michael T. Johnson
Rahim Soleymanpour
J. Berry
32
28
0
27 Jan 2022
A two-step backward compatible fullband speech enhancement system
A two-step backward compatible fullband speech enhancement system
Xu Zhang
Lianwu Chen
Xiguang Zheng
Xinlei Ren
Chen Zhang
Liang Guo
Bin Yu
59
6
0
26 Jan 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
59
18
0
24 Jan 2022
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Meta-TTS: Meta-Learning for Few-Shot Speaker Adaptive Text-to-Speech
Sung-Feng Huang
Chyi-Jiunn Lin
Da-Rong Liu
Yi-Chen Chen
Hung-yi Lee
18
56
0
07 Nov 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
10
17
0
07 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
26
2
0
06 Oct 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference
  and Bidirectional Fusion
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
29
37
0
04 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
26
36
0
29 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
42
9
0
18 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for
  End-to-End Text-to-Speech
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
86
842
0
11 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
22
160
0
06 Jun 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
19
8
0
16 Apr 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
25
187
0
01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
39
22
0
12 Feb 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
24
5
0
14 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
25
11
0
24 Nov 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
19
102
0
22 Oct 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
19
92
0
03 Sep 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
18
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
38
317
0
09 Aug 2020
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model
  with Pitch-dependent Dilated Convolution Neural Network
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network
Yi-Chiao Wu
Tomoki Hayashi
Patrick Lumban Tobing
Kazuhiro Kobayashi
T. Toda
27
18
0
11 Jul 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
60
1,357
0
08 Jun 2020
Universal Adversarial Perturbations: A Survey
Universal Adversarial Perturbations: A Survey
Ashutosh Chaubey
Nikhil Agrawal
Kavya Barnwal
K. K. Guliani
Pramod Mehta
OOD
AAML
36
46
0
16 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
21
119
0
12 May 2020
Direct Speech-to-image Translation
Direct Speech-to-image Translation
Jiguo Li
Xinfeng Zhang
Chuanmin Jia
Jizheng Xu
Li Zhang
Y. Wang
Siwei Ma
Wen Gao
36
29
0
07 Apr 2020
Vocoder-Based Speech Synthesis from Silent Videos
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
31
31
0
06 Apr 2020
DeepFake Detection: Current Challenges and Next Steps
DeepFake Detection: Current Challenges and Next Steps
Siwei Lyu
55
158
0
11 Mar 2020
Unsupervised Style and Content Separation by Minimizing Mutual
  Information for Speech Synthesis
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
6
30
0
09 Mar 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
25
56
0
04 Mar 2020
Semi-Supervised Neural Architecture Search
Semi-Supervised Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Enhong Chen
Tie-Yan Liu
13
88
0
24 Feb 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
11
130
0
06 Feb 2020
12
Next