Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhehuai Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 545 papers shown
Title
Expressive TTS Training with Frame and Style Reconstruction Loss
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
42
73
0
04 Aug 2020
Audiovisual Speech Synthesis using Tacotron2
Ahmed Hussen Abdelaziz
Anushree Prasanna Kumar
Chloe Seivwright
Gabriele Fanelli
Justin Binder
Y. Stylianou
S. Kajarekar
20
15
0
03 Aug 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Fengyu Yang
Shan Yang
Qinghua Wu
Yujun Wang
Lei Xie
39
5
0
03 Aug 2020
Detecting and analysing spontaneous oral cancer speech in the wild
B. Halpern
Rob van Son
M. V. D. Brekel
O. Scharenborg
24
9
0
28 Jul 2020
Multi-speaker Emotion Conversion via Latent Variable Regularization and a Chained Encoder-Decoder-Predictor Network
Ravi Shankar
Hsi-Wei Hsieh
N. Charon
A. Venkataraman
40
11
0
25 Jul 2020
Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation
Xiaoyuan Yi
Hyeonseung Lee
Wenhao Li
Hyung Yong Kim
Nam Soo Kim
25
22
0
25 Jul 2020
A Transfer Learning End-to-End ArabicText-To-Speech (TTS) Deep Architecture
Fady K. Fahmy
M. Khalil
Hazem M. Abbas
41
20
0
22 Jul 2020
Neural MOS Prediction for Synthesized Speech Using Multi-Task Learning With Spoofing Detection and Spoofing Type Classification
Yeunju Choi
Youngmoon Jung
Hoirin Kim
16
27
0
16 Jul 2020
Generating Visually Aligned Sound from Videos
Peihao Chen
Yang Zhang
Mingkui Tan
Hongdong Xiao
Deng Huang
Chuang Gan
VGen
24
95
0
14 Jul 2020
Xiaomingbot: A Multilingual Robot News Reporter
Runxin Xu
Jun Cao
Mingxuan Wang
Jiaze Chen
Hao Zhou
...
Xiang Yin
Xijin Zhang
Songcheng Jiang
Yuxuan Wang
Lei Li
23
11
0
12 Jul 2020
Quasi-Periodic WaveNet: An Autoregressive Raw Waveform Generative Model with Pitch-dependent Dilated Convolution Neural Network
Yi-Chiao Wu
Tomoki Hayashi
Patrick Lumban Tobing
Kazuhiro Kobayashi
T. Toda
27
18
0
11 Jul 2020
DeepSinger: Singing Voice Synthesis with Data Mined From the Web
Yi Ren
Xu Tan
Tao Qin
Jian Luan
Zhou Zhao
Tie-Yan Liu
39
73
0
09 Jul 2020
GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
Dmitry Lepikhin
HyoukJoong Lee
Yuanzhong Xu
Dehao Chen
Orhan Firat
Yanping Huang
M. Krikun
Noam M. Shazeer
Zhehuai Chen
MoE
43
1,118
0
30 Jun 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis
Antti Suni
Sofoklis Kakouros
M. Vainio
J. Šimko
19
17
0
29 Jun 2020
Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion
Narjes Bozorg
Michael T.Johnson
13
1
0
22 Jun 2020
Adversarially Trained Multi-Singer Sequence-To-Sequence Singing Synthesizer
Jie Wu
Jian Luan
25
26
0
18 Jun 2020
Implicit Neural Representations with Periodic Activation Functions
Vincent Sitzmann
Julien N. P. Martel
Alexander W. Bergman
David B. Lindell
Gordon Wetzstein
AI4TS
47
2,490
0
17 Jun 2020
Adversarial representation learning for private speech generation
David Ericsson
Adam Östberg
Edvin Listo Zec
John Martinsson
Olof Mogren
27
16
0
16 Jun 2020
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
33
2
0
12 Jun 2020
FastPitch: Parallel Text-to-speech with Pitch Prediction
Adrian Lañcucki
42
333
0
11 Jun 2020
HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks
Jiaqi Su
Zeyu Jin
Adam Finkelstein
23
137
0
10 Jun 2020
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
27
20
0
10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
26
109
0
08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
60
1,362
0
08 Jun 2020
Speech-to-Singing Conversion based on Boundary Equilibrium GAN
Da-Yi Wu
Yi-Hsuan Yang
GAN
14
8
0
28 May 2020
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices
Run Wang
Felix Juefei Xu
Yihao Huang
Qing Guo
Xiaofei Xie
Lei Ma
Yang Liu
AAML
30
105
0
28 May 2020
A comparison of Vietnamese Statistical Parametric Speech Synthesis Systems
Phan Huy Kinh
V. Phung
Anh-Tuan Dinh
Quoc Bao Nguyen
22
1
0
26 May 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
54
478
0
22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
16
5
0
21 May 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
AI4TS
22
31
0
20 May 2020
Unconditional Audio Generation with Generative Adversarial Networks and Cycle Regularization
Jen-Yu Liu
Yu-Hua Chen
Yin-Cheng Yeh
Yi-Hsuan Yang
GAN
34
35
0
18 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
37
66
0
18 May 2020
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
T. Toda
ViT
30
30
0
18 May 2020
Learning Individual Speaking Styles for Accurate Lip to Speech Synthesis
Prajwal K R
Rudrabha Mukhopadhyay
Vinay P. Namboodiri
C. V. Jawahar
29
110
0
17 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
S2IGAN: Speech-to-Image Generation via Adversarial Learning
Xinsheng Wang
Tingting Qiao
Jihua Zhu
Alan Hanjalic
O. Scharenborg
VLM
GAN
32
16
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
23
119
0
12 May 2020
FeatherWave: An efficient high-fidelity neural vocoder with multi-band linear prediction
Qiao Tian
Zewang Zhang
Heng Lu
Linghui Chen
Shan Liu
19
22
0
12 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
27
32
0
12 May 2020
Multi-band MelGAN: Faster Waveform Generation for High-Quality Text-to-Speech
Geng Yang
Shan Yang
Kai-Chun Liu
Peng Fang
Wei Chen
Lei Xie
66
198
0
11 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
24
41
0
10 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
Seung-won Park
Doo-young Kim
Myun-chul Joe
29
40
0
07 May 2020
AutoSpeech: Neural Architecture Search for Speaker Recognition
Shaojin Ding
Tianlong Chen
Xinyu Gong
Weiwei Zha
Zhangyang Wang
28
57
0
07 May 2020
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
55
724
0
30 Apr 2020
Conditional Spoken Digit Generation with StyleGAN
Kasperi Palkama
Lauri Juvela
Alexander Ilin
GAN
24
10
0
28 Apr 2020
Data Processing for Optimizing Naturalness of Vietnamese Text-to-speech System
V. Phung
Phan Huy Kinh
Anh-Tuan Dinh
Quoc Bao Nguyen
28
5
0
20 Apr 2020
F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder
Kaizhi Qian
Zeyu Jin
M. Hasegawa-Johnson
G. J. Mysore
29
107
0
15 Apr 2020
Vocoder-Based Speech Synthesis from Silent Videos
Daniel Michelsanti
Olga Slizovskaia
G. Haro
Emilia Gómez
Zheng-Hua Tan
Jesper Jensen
31
31
0
06 Apr 2020
Improving Perceptual Quality of Drum Transcription with the Expanded Groove MIDI Dataset
Lee F. Callender
Curtis Hawthorne
Jesse Engel
46
20
0
01 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
11
30
0
09 Mar 2020
Previous
1
2
3
...
10
11
8
9
Next