ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1702.07825
  4. Cited By
Deep Voice: Real-time Neural Text-to-Speech

Deep Voice: Real-time Neural Text-to-Speech

25 February 2017
Sercan Ö. Arik
Mike Chrzanowski
Adam Coates
G. Diamos
Andrew Gibiansky
Yongguo Kang
Xian Li
John Miller
Andrew Ng
Jonathan Raiman
Shubho Sengupta
M. Shoeybi
ArXivPDFHTML

Papers citing "Deep Voice: Real-time Neural Text-to-Speech"

50 / 94 papers shown
Title
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
DPN-GAN: Inducing Periodic Activations in Generative Adversarial Networks for High-Fidelity Audio Synthesis
Zeeshan Ahmad
Shudi Bao
Meng Chen
20
0
0
14 May 2025
Consistent estimation of generative model representations in the data kernel perspective space
Consistent estimation of generative model representations in the data kernel perspective space
Aranyak Acharyya
M. Trosset
Carey E. Priebe
Hayden Helm
DiffM
68
3
0
20 Jan 2025
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
CrossSpeech++: Cross-lingual Speech Synthesis with Decoupled Language and Speaker Generation
Ji-Hoon Kim
Hong-Sun Yang
Yoon-Cheol Ju
Il-Hwan Kim
Byeong-Yeol Kim
Joon Son Chung
BDL
54
0
0
31 Dec 2024
E1 TTS: Simple and Fast Non-Autoregressive TTS
E1 TTS: Simple and Fast Non-Autoregressive TTS
Zhijun Liu
Shuai Wang
Pengcheng Zhu
Mengxiao Bi
Haizhou Li
VLM
DiffM
38
3
0
14 Sep 2024
Style Description based Text-to-Speech with Conditional Prosodic Layer
  Normalization based Diffusion GAN
Style Description based Text-to-Speech with Conditional Prosodic Layer Normalization based Diffusion GAN
Neeraj Kumar
Ankur Narang
Brejesh Lall
DiffM
29
0
0
27 Oct 2023
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data
  Sequences
Kernel Limit of Recurrent Neural Networks Trained on Ergodic Data Sequences
Samuel Chun-Hei Lam
Justin A. Sirignano
K. Spiliopoulos
30
2
0
28 Aug 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
J. Dang
DiffM
30
8
0
28 Jul 2023
Transformers in Speech Processing: A Survey
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis
  Dataset
MnTTS2: An Open-Source Multi-Speaker Mongolian Text-to-Speech Synthesis Dataset
Kailin Liang
Bin Liu
Yifan Hu
Rui Liu
F. Bao
Guanglai Gao
28
1
0
11 Dec 2022
Learning to Dub Movies via Hierarchical Prosody Models
Learning to Dub Movies via Hierarchical Prosody Models
Gaoxiang Cong
Liang Li
Yuankai Qi
Zhengjun Zha
Qi Wu
Wen-yu Wang
Bin Jiang
Ming Yang
Qin Huang
75
25
0
08 Dec 2022
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and
  Transfer Learning
Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning
Ankur Debnath
Shridevi S Patil
Gangotri Nadiger
R. Ganesan
26
20
0
07 Dec 2022
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice
  Source Features
Evince the artifacts of Spoof Speech by blending Vocal Tract and Voice Source Features
T. U. K. Reddy
Sahukari Chaitanya Varun
Kota Pranav Kumar Sankala Sreekanth
K. Murty
23
0
0
05 Dec 2022
Deep Fake Detection, Deterrence and Response: Challenges and
  Opportunities
Deep Fake Detection, Deterrence and Response: Challenges and Opportunities
Amin Azmoodeh
Ali Dehghantanha
45
2
0
26 Nov 2022
Towards zero-shot Text-based voice editing using acoustic context
  conditioning, utterance embeddings, and reference encoders
Towards zero-shot Text-based voice editing using acoustic context conditioning, utterance embeddings, and reference encoders
Jason Fong
Yun Wang
Prabhav Agrawal
Vimal Manohar
Jilong Wu
Thilo Kohler
Qing He
15
0
0
28 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep
  Learning Era
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
20
53
0
06 Oct 2022
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and
  Accompanied Baseline
MnTTS: An Open-Source Mongolian Text-to-Speech Synthesis Dataset and Accompanied Baseline
Yifan Hu
Pengkai Yin
Rui Liu
F. Bao
Guanglai Gao
18
5
0
22 Sep 2022
Controllable Accented Text-to-Speech Synthesis
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
34
6
0
22 Sep 2022
Detecting Synthetic Speech Manipulation in Real Audio Recordings
Detecting Synthetic Speech Manipulation in Real Audio Recordings
M. Rahman
M. Graciarena
Diego Castán
Chris Cobo-Kroenke
Mitchell McLaren
A. Lawson
AAML
25
9
0
15 Sep 2022
On the Horizon: Interactive and Compositional Deepfakes
On the Horizon: Interactive and Compositional Deepfakes
Eric Horvitz
16
27
0
05 Sep 2022
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and
  Any-to-any Voice Conversion
Glow-WaveGAN 2: High-quality Zero-shot Text-to-speech Synthesis and Any-to-any Voice Conversion
Yinjiao Lei
Shan Yang
Jian Cong
Linfu Xie
Dan Su
DiffM
52
12
0
05 Jul 2022
RF-Next: Efficient Receptive Field Search for Convolutional Neural
  Networks
RF-Next: Efficient Receptive Field Search for Convolutional Neural Networks
Shanghua Gao
Zhong-Yu Li
Qi Han
Ming-Ming Cheng
Liang Wang
32
34
0
14 Jun 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
213
0
09 May 2022
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker
  Classifier Joint Training
Cross-Lingual Text-to-Speech Using Multi-Task Learning and Speaker Classifier Joint Training
J. Yang
Lei He
36
11
0
20 Jan 2022
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey
How Deep Are the Fakes? Focusing on Audio Deepfake: A Survey
Zahra Khanjani
Gabrielle Watson
V. P Janeja
25
25
0
28 Nov 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
23
0
25 Nov 2021
Emotional Prosody Control for Speech Generation
Emotional Prosody Control for Speech Generation
S. Sivaprasad
Saiteja Kosgi
Vineet Gandhi
10
17
0
07 Nov 2021
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
WaveFake: A Data Set to Facilitate Audio Deepfake Detection
Joel Frank
Lea Schonherr
DiffM
129
123
0
04 Nov 2021
Neural Dubber: Dubbing for Videos According to Scripts
Neural Dubber: Dubbing for Videos According to Scripts
Chenxu Hu
Qiao Tian
Tingle Li
Yuping Wang
Yuxuan Wang
Hang Zhao
DiffM
VGen
36
39
0
15 Oct 2021
Cross-speaker Emotion Transfer Based on Speaker Condition Layer
  Normalization and Semi-Supervised Training in Text-To-Speech
Cross-speaker Emotion Transfer Based on Speaker Condition Layer Normalization and Semi-Supervised Training in Text-To-Speech
Pengfei Wu
Junjie Pan
Chenchang Xu
Junhui Zhang
Lin Wu
Xiang Yin
Zejun Ma
10
16
0
08 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
26
2
0
06 Oct 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
47
6
0
16 Aug 2021
Video Generation from Text Employing Latent Path Construction for
  Temporal Modeling
Video Generation from Text Employing Latent Path Construction for Temporal Modeling
Amir Mazaheri
M. Shah
30
8
0
29 Jul 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
18
352
0
29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery
AI based Presentation Creator With Customized Audio Content Delivery
Muvazima Mansoor
Srikanth Chandar
Ramamoorthy Srinath
26
0
0
27 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Dan Su
20
30
0
21 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
25
160
0
06 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
18
55
0
24 May 2021
RotLSTM: Rotating Memories in Recurrent Neural Networks
RotLSTM: Rotating Memories in Recurrent Neural Networks
Vlad Velici
Adam Prugel-Bennett
RALM
VLM
17
1
0
01 May 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
21
8
0
16 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
45
81
0
28 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
37
187
0
01 Mar 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
36
42
0
01 Feb 2021
Empowering Things with Intelligence: A Survey of the Progress,
  Challenges, and Opportunities in Artificial Intelligence of Things
Empowering Things with Intelligence: A Survey of the Progress, Challenges, and Opportunities in Artificial Intelligence of Things
Jing Zhang
Dacheng Tao
45
462
0
17 Nov 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
34
1,392
0
21 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
19
92
0
03 Sep 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based
  Neural Vocoder
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon
Sang-Hoon Lee
Hyeong-Rae Noh
Seong-Whan Lee
20
11
0
16 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
41
317
0
09 Aug 2020
A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep
  Architecture
A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture
Fady K. Fahmy
M. Khalil
Hazem M. Abbas
41
20
0
22 Jul 2020
12
Next