ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhehuai Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXivPDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

50 / 545 papers shown
Title
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
27
8
0
16 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure
  for Expressive Text-to-Speech Synthesis
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Dan Su
Helen Meng
41
6
0
14 Apr 2021
Comparing the Benefit of Synthetic Training Data for Various Automatic
  Speech Recognition Architectures
Comparing the Benefit of Synthetic Training Data for Various Automatic Speech Recognition Architectures
Nick Rossenbach
Mohammad Zeineldeen
Benedikt Hilmes
Ralf Schluter
Hermann Ney
36
12
0
12 Apr 2021
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Mahsa Elyasi
Gaurav Bharaj
19
2
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
Helen Meng
25
47
0
08 Apr 2021
Attention Forcing for Machine Translation
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark Gales
31
7
0
02 Apr 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
50
81
0
28 Mar 2021
Latent Space Explorations of Singing Voice Synthesis using DDSP
Latent Space Explorations of Singing Voice Synthesis using DDSP
J. Alonso
Cumhur Erkut
46
12
0
12 Mar 2021
Investigating on Incorporating Pretrained and Learnable Speaker
  Representations for Multi-Speaker Multi-Style Text-to-Speech
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech
C. Chien
Jheng-hao Lin
Chien-yu Huang
Po-Chun Hsu
Hung-yi Lee
27
68
0
06 Mar 2021
WaveGuard: Understanding and Mitigating Audio Adversarial Examples
WaveGuard: Understanding and Mitigating Audio Adversarial Examples
Shehzeen Samarah Hussain
Paarth Neekhara
Shlomo Dubnov
Julian McAuley
F. Koushanfar
AAML
33
71
0
04 Mar 2021
A Spectral Enabled GAN for Time Series Data Generation
A Spectral Enabled GAN for Time Series Data Generation
Kaleb E. Smith
Anthony O. Smith
GAN
30
12
0
02 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
37
188
0
01 Mar 2021
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
MBNet: MOS Prediction for Synthesized Speech with Mean-Bias Network
Yichong Leng
Xu Tan
Sheng Zhao
Frank Soong
Xiang-Yang Li
Tao Qin
32
96
0
27 Feb 2021
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in
  Frames
MaskCycleGAN-VC: Learning Non-parallel Voice Conversion with Filling in Frames
Takuhiro Kaneko
Hirokazu Kameoka
Kou Tanaka
Nobukatsu Hojo
38
57
0
25 Feb 2021
Generating Human Readable Transcript for Automatic Speech Recognition
  with Pre-trained Language Model
Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model
Junwei Liao
Yu Shi
Ming Gong
Linjun Shou
Sefik Emre Eskimez
Liyang Lu
Hong Qu
Michael Zeng
25
9
0
22 Feb 2021
AISPEECH-SJTU accent identification system for the Accented English
  Speech Recognition Challenge
AISPEECH-SJTU accent identification system for the Accented English Speech Recognition Challenge
Houjun Huang
Xu Xiang
Yexin Yang
Rao Ma
Y. Qian
19
25
0
19 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Universal Neural Vocoding with Parallel WaveNet
Universal Neural Vocoding with Parallel WaveNet
Yunlong Jiao
Adam Gabry's
Georgi Tinchev
Bartosz Putrycz
Daniel Korzekwa
V. Klimkov
36
42
0
01 Feb 2021
Generating coherent spontaneous speech and gesture from text
Generating coherent spontaneous speech and gesture from text
Simon Alexanderson
Éva Székely
G. Henter
Taras Kucherenko
Jonas Beskow
SLR
37
22
0
14 Jan 2021
A Survey on Deep Reinforcement Learning for Audio-Based Applications
A Survey on Deep Reinforcement Learning for Audio-Based Applications
S. Latif
Heriberto Cuayáhuitl
Farrukh Pervez
Fahad Shamshad
Hafiz Shehbaz Ali
Min Zhang
OffRL
60
73
0
01 Jan 2021
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Few Shot Adaptive Normalization Driven Multi-Speaker Speech Synthesis
Neeraj Kumar
Srishti Goel
Ankur Narang
Brejesh Lall
29
5
0
14 Dec 2020
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech
  Synthesis
DeepTalk: Vocal Style Encoding for Speaker Recognition and Speech Synthesis
Anurag Chowdhury
Arun Ross
Prabu David
16
5
0
09 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech
  Synthesis
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
16
9
0
03 Dec 2020
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Synth2Aug: Cross-domain speaker recognition with TTS synthesized speech
Yiling Huang
Yutian Chen
Jason W. Pelecanos
Quan Wang
33
11
0
24 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
28
73
0
17 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
19
5
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for
  Emotional Speech Synthesis
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
27
55
0
17 Nov 2020
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
Hierarchical Prosody Modeling for Non-Autoregressive Speech Synthesis
C. Chien
Hung-yi Lee
32
36
0
12 Nov 2020
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor
  and Neural Waveform Model
Enhancing Low-Quality Voice Recordings Using Disentangled Channel Factor and Neural Waveform Model
Haoyu Li
Yang Ai
Junichi Yamagishi
17
2
0
10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
31
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
24
98
0
06 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural
  Text-to-Speech
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
28
19
0
04 Nov 2020
PPG-based singing voice conversion with adversarial representation
  learning
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
19
37
0
28 Oct 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset
  with the assistance of cross-domain speech emotion recognition
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition
Xiong Cai
Dongyang Dai
Zhiyong Wu
Xiang Li
Jingbei Li
Helen Meng
14
66
0
26 Oct 2020
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech
  Synthesis
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis
Rui Liu
Berrak Sisman
Haizhou Li
20
24
0
23 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
35
219
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
30
102
0
22 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on
  Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Shengkui Zhao
Trung Hieu Nguyen
Hao Wang
B. Ma
18
25
0
16 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
54
1,869
0
12 Oct 2020
Improving Low Resource Code-switched ASR using Augmented Code-switched
  TTS
Improving Low Resource Code-switched ASR using Augmented Code-switched TTS
Yash Sharma
Basil Abraham
Karan Taneja
Preethi Jyothi
19
20
0
12 Oct 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
36
1,397
0
21 Sep 2020
Controllable neural text-to-speech synthesis using intuitive prosodic
  features
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
42
66
0
14 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
21
92
0
03 Sep 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based
  Neural Vocoder
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon
Sang-Hoon Lee
Hyeong-Rae Noh
Seong-Whan Lee
20
11
0
16 Aug 2020
SpeedySpeech: Efficient Neural Speech Synthesis
SpeedySpeech: Efficient Neural Speech Synthesis
Jan Vainer
Ondrej Dusek
24
42
0
09 Aug 2020
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling
Deep MOS Predictor for Synthetic Speech Using Cluster-Based Modeling
Yeunju Choi
Youngmoon Jung
Hoirin Kim
24
26
0
09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
18
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
45
318
0
09 Aug 2020
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
T. Toda
27
38
0
07 Aug 2020
Previous
123...1011789
Next