Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.05884
Cited By
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
16 December 2017
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
Zongheng Yang
Zhehuai Chen
Yu Zhang
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions"
50 / 545 papers shown
Title
Hierarchical prosody modeling and control in non-autoregressive parallel neural TTS
T. Raitio
Jiangchuan Li
Shreyas Seshadri
37
22
0
06 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
An Investigation of the Effectiveness of Phase for Audio Classification
Shunsuke Hidaka
Kohei Wakamiya
T. Kaburagi
23
4
0
06 Oct 2021
GANtron: Emotional Speech Synthesis with Generative Adversarial Networks
E. Hortal
Rodrigo Brechard Alarcia
GAN
26
2
0
06 Oct 2021
On the Interplay Between Sparsity, Naturalness, Intelligibility, and Prosody in Speech Synthesis
Cheng-I Jeff Lai
Erica Cooper
Yang Zhang
Shiyu Chang
Kaizhi Qian
...
Yung-Sung Chuang
Alexander H. Liu
Junichi Yamagishi
David D. Cox
James R. Glass
28
6
0
04 Oct 2021
Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification
Bidisha Sharma
Maulik C. Madhavi
Xuehao Zhou
Haizhou Li
23
2
0
28 Sep 2021
Low-Latency Incremental Text-to-Speech Synthesis with Distilled Context Prediction Network
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
34
3
0
22 Sep 2021
"Hello, It's Me": Deep Learning-based Speech Synthesis Attacks in the Real World
Emily Wenger
Max Bronckers
Christian Cianfarani
Jenna Cryan
Angela Sha
Haitao Zheng
Ben Y. Zhao
AAML
40
39
0
20 Sep 2021
Benchmarking and challenges in security and privacy for voice biometrics
J. Bonastre
Héctor Delgado
Nicholas W. D. Evans
Tomi Kinnunen
Kong Aik Lee
...
Massimiliano Todisco
N. Tomashenko
Emmanuel Vincent
Xin Wang
Junichi Yamagishi
31
8
0
01 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
40
18
0
30 Aug 2021
Integrated Speech and Gesture Synthesis
Siyang Wang
Simon Alexanderson
Joakim Gustafson
Jonas Beskow
G. Henter
Éva Székely
37
19
0
25 Aug 2021
One TTS Alignment To Rule Them All
Rohan Badlani
A. Lancucki
Kevin J. Shih
Rafael Valle
Ming-Yu Liu
Bryan Catanzaro
38
82
0
23 Aug 2021
Fighting Game Commentator with Pitch and Loudness Adjustment Utilizing Highlight Cues
Junjie H. Xu
Zhou Fang
Qihang Chen
Satoru Ohno
Pujana Paliyawan
22
4
0
18 Aug 2021
Combining speakers of multiple languages to improve quality of neural voices
Javier Latorre
Charlotte Bailleul
Tuuli H. Morrill
Alistair Conkie
Y. Stylianou
38
8
0
17 Aug 2021
GC-TTS: Few-shot Speaker Adaptation with Geometric Constraints
Ji-Hoon Kim
Sang-Hoon Lee
Ji-Hyun Lee
Hong G Jung
Seong-Whan Lee
47
6
0
16 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
50
42
0
06 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
25
22
0
27 Jul 2021
Use of speaker recognition approaches for learning and evaluating embedding representations of musical instrument sounds
Xuan Shi
Erica Cooper
Junichi Yamagishi
33
7
0
24 Jul 2021
SVSNet: An End-to-end Speaker Voice Similarity Assessment Model
Cheng-Hung Hu
Yu-Huai Peng
Junichi Yamagishi
Yu Tsao
Hsin-Min Wang
29
5
0
20 Jul 2021
Human Perception of Audio Deepfakes
Nicolas Müller
Karla Markert
Konstantin Böttinger
27
49
0
20 Jul 2021
Translatotron 2: High-quality direct speech-to-speech translation with voice preservation
Ye Jia
Michelle Tadmor Ramanovich
Tal Remez
Roi Pomerantz
28
68
0
19 Jul 2021
EditSpeech: A Text Based Speech Editing System Using Partial Inference and Bidirectional Fusion
Daxin Tan
Liqun Deng
Y. Yeung
Xin Jiang
Xiao Chen
Tan Lee
29
38
0
04 Jul 2021
Supervised Contrastive Learning for Accented Speech Recognition
Tao Han
Hantao Huang
Ziang Yang
Wei Han
49
15
0
02 Jul 2021
The USTC-NELSLIP Systems for Simultaneous Speech Translation Task at IWSLT 2021
Dan Liu
Mengge Du
Xiaoxi Li
Yuchen Hu
Lirong Dai
32
20
0
01 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures
Prateek Verma
C. Chafe
30
28
0
30 Jun 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
23
0
0
29 Jun 2021
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
20
353
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
34
36
0
29 Jun 2021
Transflower: probabilistic autoregressive dance generation with multimodal attention
Guillermo Valle Pérez
G. Henter
Jonas Beskow
A. Holzapfel
Pierre-Yves Oudeyer
Simon Alexanderson
35
42
0
25 Jun 2021
Basis-MelGAN: Efficient Neural Vocoder Based on Audio Decomposition
Zhengxi Liu
Y. Qian
DRL
21
10
0
25 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
26
3
0
21 Jun 2021
Non-native English lexicon creation for bilingual speech synthesis
Arun Baby
Pranav Jawale
Saranya Vinnaitherthan
Sumukh Badam
Nagaraj Adiga
Sharath Adavanne
22
8
0
21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Dan Su
22
30
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
46
9
0
18 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
23
88
0
17 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
24
35
0
17 Jun 2021
Ctrl-P: Temporal Control of Prosodic Variation for Speech Synthesis
D. Mohan
Qinmin Hu
Tian Huey Teh
Alexandra Torresquintero
C. Wallis
Marlene Staib
Lorenzo Foglianti
Jiameng Gao
Simon King
25
16
0
15 Jun 2021
UnivNet: A Neural Vocoder with Multi-Resolution Spectrogram Discriminators for High-Fidelity Waveform Generation
Won Jang
D. Lim
Jaesam Yoon
Bongwan Kim
Juntae Kim
38
125
0
15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
37
7
0
14 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
Helen Meng
Chao Weng
Dan Su
31
24
0
11 Jun 2021
Conditional Variational Autoencoder with Adversarial Learning for End-to-End Text-to-Speech
Jaehyeon Kim
Jungil Kong
Juhee Son
DRL
89
847
0
11 Jun 2021
NWT: Towards natural audio-to-video generation with representation learning
Rayhane Mama
Marc S. Tyndel
Hashiam Kadhim
Cole Clifford
Ragavan Thurairatnam
VGen
31
12
0
08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
25
160
0
06 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
20
55
0
24 May 2021
Speaker disentanglement in video-to-speech conversion
Dan Oneaţă
Adriana Stan
H. Cucu
24
9
0
20 May 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
61
515
0
13 May 2021
Learning Robust Latent Representations for Controllable Speech Synthesis
Shakti Kumar
Jithin Pradeep
Hussain Zaidi
DRL
41
6
0
10 May 2021
DiffSinger: Singing Voice Synthesis via Shallow Diffusion Mechanism
Jinglin Liu
Chengxi Li
Yi Ren
Feiyang Chen
Zhou Zhao
DiffM
58
259
0
06 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
28
12
0
05 May 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
Maja Pantic
35
43
0
27 Apr 2021
Previous
1
2
3
...
10
11
6
7
8
9
Next