ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.05884
  4. Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions

Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions

16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhehuai Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
ArXivPDFHTML

Papers citing "Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"

45 / 545 papers shown
Title
Multi-reference Tacotron by Intercross Training for Style
  Disentangling,Transfer and Control in Speech Synthesis
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
18
46
0
04 Apr 2019
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence
  Modeling
Lingvo: a Modular and Scalable Framework for Sequence-to-Sequence Modeling
Jonathan Shen
Patrick Nguyen
Yonghui Wu
Zhehuai Chen
Mengzhao Chen
...
William Chan
Shubham Toshniwal
Baohua Liao
M. Nirschl
Pat Rondon
VLM
27
209
0
21 Feb 2019
Audio-Linguistic Embeddings for Spoken Sentences
Audio-Linguistic Embeddings for Spoken Sentences
Albert Haque
Michelle Guo
Prateek Verma
Li Fei-Fei
28
51
0
20 Feb 2019
Data Efficient Voice Cloning for Neural Singing Synthesis
Data Efficient Voice Cloning for Neural Singing Synthesis
Merlijn Blaauw
J. Bonada
R. Daido
27
33
0
19 Feb 2019
Unsupervised Polyglot Text To Speech
Unsupervised Polyglot Text To Speech
Eliya Nachmani
Lior Wolf
19
42
0
06 Feb 2019
Exploring Transfer Learning for Low Resource Emotional TTS
Exploring Transfer Learning for Low Resource Emotional TTS
Noé Tits
Kevin El Haddad
Thierry Dutoit
25
61
0
14 Jan 2019
Introduction to Voice Presentation Attack Detection and Recent Advances
Introduction to Voice Presentation Attack Detection and Recent Advances
Md. Sahidullah
Héctor Delgado
Massimiliano Todisco
Tomi Kinnunen
Nicholas W. D. Evans
Junichi Yamagishi
Kong-Aik Lee
AAML
19
75
0
04 Jan 2019
Feature reinforcement with word embedding and parsing information in
  neural TTS
Feature reinforcement with word embedding and parsing information in neural TTS
Huaiping Ming
Lei He
Haohan Guo
Frank Soong
74
15
0
03 Jan 2019
FPETS : Fully Parallel End-to-End Text-to-Speech System
FPETS : Fully Parallel End-to-End Text-to-Speech System
Dabiao Ma
Zhiba Su
Wenxuan Wang
Yuhao Lu
24
6
0
12 Dec 2018
Generative Adversarial Network based Speaker Adaptation for High
  Fidelity WaveNet Vocoder
Generative Adversarial Network based Speaker Adaptation for High Fidelity WaveNet Vocoder
Qiao Tian
Bing Yang
Shan Liu
GAN
24
9
0
06 Dec 2018
Learning pronunciation from a foreign language in speech synthesis
  networks
Learning pronunciation from a foreign language in speech synthesis networks
Younggun Lee
Suwon Shon
Taesu Kim
22
26
0
23 Nov 2018
TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre
  Transfer
TimbreTron: A WaveNet(CycleGAN(CQT(Audio))) Pipeline for Musical Timbre Transfer
Sicong Huang
Qiyang Li
Cem Anil
Xuchan Bao
Sageev Oore
Roger C. Grosse
30
97
0
22 Nov 2018
Bytes are All You Need: End-to-End Multilingual Speech Recognition and
  Synthesis with Bytes
Bytes are All You Need: End-to-End Multilingual Speech Recognition and Synthesis with Bytes
Bo-wen Li
Yu Zhang
Tara N. Sainath
Yonghui Wu
William Chan
AuLLM
24
129
0
22 Nov 2018
Towards achieving robust universal neural vocoding
Towards achieving robust universal neural vocoding
Jaime Lorenzo-Trueba
Thomas Drugman
Javier Latorre
Thomas Merritt
Bartosz Putrycz
Roberto Barra-Chicote
Alexis Moinet
Vatsal Aggarwal
DRL
23
19
0
15 Nov 2018
PerformanceNet: Score-to-Audio Music Generation with Multi-Band
  Convolutional Residual Network
PerformanceNet: Score-to-Audio Music Generation with Multi-Band Convolutional Residual Network
Bryan Wang
Yi-Hsuan Yang
28
38
0
11 Nov 2018
ExcitNet vocoder: A neural excitation model for parametric speech
  synthesis systems
ExcitNet vocoder: A neural excitation model for parametric speech synthesis systems
Eunwoo Song
Kyungguen Byun
Hong-Goo Kang
16
29
0
09 Nov 2018
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and
  Context Preservation Mechanisms
AttS2S-VC: Sequence-to-Sequence Voice Conversion with Attention and Context Preservation Mechanisms
Kou Tanaka
Hirokazu Kameoka
Takuhiro Kaneko
Nobukatsu Hojo
19
111
0
09 Nov 2018
FloWaveNet : A Generative Flow for Raw Audio
FloWaveNet : A Generative Flow for Raw Audio
Sungwon Kim
Sang-gil Lee
Jongyoon Song
Jaehyeon Kim
Sungroh Yoon
16
168
0
06 Nov 2018
Robust and fine-grained prosody control of end-to-end speech synthesis
Robust and fine-grained prosody control of end-to-end speech synthesis
Younggun Lee
Jonathan Le Roux
11
147
0
06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
31
159
0
05 Nov 2018
ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion
ConvS2S-VC: Fully convolutional sequence-to-sequence voice conversion
Hirokazu Kameoka
Kou Tanaka
Damian Kwaśny
Takuhiro Kaneko
Nobukatsu Hojo
33
62
0
05 Nov 2018
WaveGlow: A Flow-based Generative Network for Speech Synthesis
WaveGlow: A Flow-based Generative Network for Speech Synthesis
R. Prenger
Rafael Valle
Bryan Catanzaro
57
1,023
0
31 Oct 2018
Speaking style adaptation in Text-To-Speech synthesis using
  Sequence-to-sequence models with attention
Speaking style adaptation in Text-To-Speech synthesis using Sequence-to-sequence models with attention
Bajibabu Bollepalli
Lauri Juvela
P. Alku
23
4
0
29 Oct 2018
Investigation of enhanced Tacotron text-to-speech synthesis systems with
  self-attention for pitch accent language
Investigation of enhanced Tacotron text-to-speech synthesis systems with self-attention for pitch accent language
Yusuke Yasuda
Xin Wang
Shinji Takaki
Junichi Yamagishi
22
86
0
29 Oct 2018
STFT spectral loss for training a neural speech waveform model
STFT spectral loss for training a neural speech waveform model
Shinji Takaki
Toru Nakashika
Xin Wang
Junichi Yamagishi
26
21
0
29 Oct 2018
Reducing over-smoothness in speech synthesis using Generative
  Adversarial Networks
Reducing over-smoothness in speech synthesis using Generative Adversarial Networks
Leyuan Sheng
Evgeny Nikolaevich Pavlovskiy
GAN
22
9
0
25 Oct 2018
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Sequence-to-Sequence Acoustic Modeling for Voice Conversion
Jing-Xuan Zhang
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Lirong Dai
19
129
0
16 Oct 2018
Conditional WaveGAN
Conditional WaveGAN
Chae Young Lee
Anoop Toffy
G. Jung
W. Han
DiffM
24
21
0
27 Sep 2018
Semi-Supervised Training for Improving Data Efficiency in End-to-End
  Speech Synthesis
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An Chung
Yuxuan Wang
Wei-Ning Hsu
Yu Zhang
RJ Skerry-Ryan
24
117
0
30 Aug 2018
Fast Spectrogram Inversion using Multi-head Convolutional Neural
  Networks
Fast Spectrogram Inversion using Multi-head Convolutional Neural Networks
Sercan Ö. Arik
Heewoo Jun
G. Diamos
14
107
0
20 Aug 2018
Sounderfeit: Cloning a Physical Model using a Conditional Adversarial
  Autoencoder
Sounderfeit: Cloning a Physical Model using a Conditional Adversarial Autoencoder
Stephen Sinclair
GAN
18
1
0
25 Jun 2018
Frequency domain variants of velvet noise and their application to
  speech processing and synthesis: with appendices
Frequency domain variants of velvet noise and their application to speech processing and synthesis: with appendices
Hideki Kawahara
Ken-Ichi Sakakibara
Masanori Morise
Hideki Banno
T. Toda
Toshio Irino
19
10
0
18 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhehuai Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
207
821
0
12 Jun 2018
Voice Imitating Text-to-Speech Neural Networks
Voice Imitating Text-to-Speech Neural Networks
Younggun Lee
Taesu Kim
Soo-Young Lee
34
11
0
04 Jun 2018
A Universal Music Translation Network
A Universal Music Translation Network
Noam Mor
Lior Wolf
Adam Polyak
Yaniv Taigman
25
110
0
21 May 2018
Collapsed speech segment detection and suppression for WaveNet vocoder
Collapsed speech segment detection and suppression for WaveNet vocoder
Yi-Chiao Wu
Kazuhiro Kobayashi
Tomoki Hayashi
Patrick Lumban Tobing
T. Toda
15
25
0
30 Apr 2018
Speaker-independent raw waveform model for glottal excitation
Speaker-independent raw waveform model for glottal excitation
Lauri Juvela
Vassilis Tsiaras
Bajibabu Bollepalli
Manu Airaksinen
Junichi Yamagishi
P. Alku
19
39
0
25 Apr 2018
Conditional End-to-End Audio Transforms
Conditional End-to-End Audio Transforms
Albert Haque
Michelle Guo
Prateek Verma
33
41
0
30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
  Tacotron
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
19
548
0
24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
38
815
0
23 Mar 2018
Neural Network Quine
Neural Network Quine
Oscar Chang
Hod Lipson
21
23
0
15 Mar 2018
Can we steal your vocal identity from the Internet?: Initial
  investigation of cloning Obama's voice using GAN, WaveNet and low-quality
  found data
Can we steal your vocal identity from the Internet?: Initial investigation of cloning Obama's voice using GAN, WaveNet and low-quality found data
Jaime Lorenzo-Trueba
Fuming Fang
Xin Wang
Isao Echizen
Junichi Yamagishi
Tomi Kinnunen
6
73
0
02 Mar 2018
Do WaveNets Dream of Acoustic Waves?
Do WaveNets Dream of Acoustic Waves?
Kanru Hua
24
1
0
23 Feb 2018
Fitting New Speakers Based on a Short Untranscribed Sample
Fitting New Speakers Based on a Short Untranscribed Sample
Eliya Nachmani
Adam Polyak
Yaniv Taigman
Lior Wolf
24
84
0
20 Feb 2018
Adversarial Audio Synthesis
Adversarial Audio Synthesis
Chris Donahue
Julian McAuley
M. Puckette
GAN
45
604
0
12 Feb 2018
Previous
123...10119