Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1703.10135
Cited By
Tacotron: Towards End-to-End Speech Synthesis
29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tacotron: Towards End-to-End Speech Synthesis"
50 / 817 papers shown
Title
Speech Prediction in Silent Videos using Variational Autoencoders
Ravindra Yadav
Ashish Sardana
Vinay P. Namboodiri
R. Hegde
VGen
DRL
29
23
0
14 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
26
51
0
11 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Erica Cooper
Xin Wang
Yi Zhao
Yusuke Yasuda
Junichi Yamagishi
SyDa
14
3
0
10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
34
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
24
98
0
06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Guanghui Xu
Wei Song
Zhengchen Zhang
Chao Zhang
Xiaodong He
Bowen Zhou
18
50
0
06 Nov 2020
Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities
Hao Zhang
Jae Hun Ro
R. Sproat
6
13
0
05 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
28
19
0
04 Nov 2020
Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time
Sashi Novitasari
Andros Tjandra
Tomoya Yanagita
S. Sakti
Satoshi Nakamura
CLL
14
1
0
04 Nov 2020
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
19
3
0
04 Nov 2020
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
Disong Wang
Songxiang Liu
Lifa Sun
Xixin Wu
Xunying Liu
Helen Meng
18
8
0
03 Nov 2020
FeatherTTS: Robust and Efficient attention based Neural TTS
Qiao Tian
Zewang Zhang
Chao-Jung Liu
Heng Lu
Linghui Chen
Bin Wei
P. He
Shan Liu
26
4
0
02 Nov 2020
The IQIYI System for Voice Conversion Challenge 2020
Wendong Gan
Haitao Chen
Yin Yan
Jianwei Li
Bolong Wen
Xueping Xu
Hai Li
13
0
0
29 Oct 2020
DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech
Zhiying Huang
Hao Li
Ming Lei
14
11
0
29 Oct 2020
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
19
37
0
28 Oct 2020
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis
Min-Jae Hwang
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
26
32
0
26 Oct 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition
Xiong Cai
Dongyang Dai
Zhiyong Wu
Xiang Li
Jingbei Li
Helen Meng
14
66
0
26 Oct 2020
GSEP: A robust vocal and accompaniment separation system using gated CBHG module and loudness normalization
S. Park
Ben Sangbae Chon
19
2
0
23 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
41
219
0
22 Oct 2020
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020
Haobo Zhang
Tingzhi Mao
Haihua Xu
Hao-Ming Huang
15
1
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
30
102
0
22 Oct 2020
NU-GAN: High resolution neural upsampling with GAN
Rithesh Kumar
Kundan Kumar
Vicki Anand
Yoshua Bengio
Aaron Courville
27
25
0
22 Oct 2020
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems
Antoine Perquin
Erica Cooper
Junichi Yamagishi
14
1
0
21 Oct 2020
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Renjie Zheng
Mingbo Ma
Baigong Zheng
Kaibo Liu
Jiahong Yuan
Kenneth Church
Liang Huang
18
14
0
20 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
31
16
0
19 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Shengkui Zhao
Trung Hieu Nguyen
Hao Wang
B. Ma
18
25
0
16 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
27
112
0
08 Oct 2020
JSSS: free Japanese speech corpus for summarization and simplification
Shinnosuke Takamichi
Mamoru Komachi
Naoko Tanji
Hiroshi Saruwatari
8
1
0
05 Oct 2020
Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion
Che-Jui Chang
20
5
0
30 Sep 2020
Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data
Mingyang Zhang
Yi Zhou
Li Zhao
Haizhou Li
24
53
0
30 Sep 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
36
1,402
0
21 Sep 2020
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Yukiya Hono
Kazuna Tsuboi
Kei Sawada
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
BDL
11
24
0
17 Sep 2020
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
42
66
0
14 Sep 2020
Visual-speech Synthesis of Exaggerated Corrective Feedback
Yaohua Bu
Weijun Li
Tianyi Ma
S. Chen
Jia Jia
Kun Li
Xiaobo Lu
16
1
0
12 Sep 2020
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Songxiang Liu
Yuewen Cao
Disong Wang
Xixin Wu
Xunying Liu
Helen Meng
BDL
29
88
0
06 Sep 2020
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS
Brooke Stephenson
Laurent Besacier
Laurent Girin
Thomas Hueber
26
13
0
04 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
21
92
0
03 Sep 2020
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer
Jing-Xuan Zhang
Li-Juan Liu
Yan-Nian Chen
Ya-Jun Hu
Yuan Jiang
Zhenhua Ling
Lirong Dai
19
17
0
03 Sep 2020
WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
William Chan
DiffM
BDL
19
776
0
02 Sep 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
Yi Zhao
Wen-Chin Huang
Xiaohai Tian
Junichi Yamagishi
Rohan Kumar Das
Tomi Kinnunen
Zhenhua Ling
Tomoki Toda
27
206
0
28 Aug 2020
Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning
Noé Tits
Kevin El Haddad
Thierry Dutoit
17
14
0
20 Aug 2020
Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders
Mingjie Chen
Thomas Hain
SSL
DRL
19
15
0
16 Aug 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon
Sang-Hoon Lee
Hyeong-Rae Noh
Seong-Whan Lee
20
11
0
16 Aug 2020
LSTM Acoustic Models Learn to Align and Pronounce with Graphemes
A. Datta
Guanlong Zhao
Bhuvana Ramabhadran
Eugene Weinstein
23
0
0
13 Aug 2020
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
D. Paul
M. Shifas
Yannis Pantazis
Y. Stylianou
14
21
0
13 Aug 2020
Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages
Haitong Zhang
Yue Lin
6
30
0
11 Aug 2020
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training
Jian Cong
Shan Yang
Lei Xie
Guoqiao Yu
Guanglu Wan
32
30
0
10 Aug 2020
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions
D. Paul
Yannis Pantazis
Y. Stylianou
DRL
18
29
0
09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
23
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
48
319
0
09 Aug 2020
Previous
1
2
3
...
11
12
13
...
15
16
17
Next