ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
Speech Prediction in Silent Videos using Variational Autoencoders
Speech Prediction in Silent Videos using Variational Autoencoders
Ravindra Yadav
Ashish Sardana
Vinay P. Namboodiri
R. Hegde
VGen
DRL
29
23
0
14 Nov 2020
Low-resource expressive text-to-speech using data augmentation
Low-resource expressive text-to-speech using data augmentation
Goeric Huybrechts
Thomas Merritt
Giulia Comini
Bartek Perz
Raahil Shah
Jaime Lorenzo-Trueba
26
51
0
11 Nov 2020
Pretraining Strategies, Waveform Model Choice, and Acoustic
  Configurations for Multi-Speaker End-to-End Speech Synthesis
Pretraining Strategies, Waveform Model Choice, and Acoustic Configurations for Multi-Speaker End-to-End Speech Synthesis
Erica Cooper
Xin Wang
Yi Zhao
Yusuke Yasuda
Junichi Yamagishi
SyDa
14
3
0
10 Nov 2020
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech
  Synthesis via Phone-Level Content-Style Disentanglement
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement
Daxin Tan
Tan Lee
34
21
0
08 Nov 2020
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Wave-Tacotron: Spectrogram-free end-to-end text-to-speech synthesis
Ron J. Weiss
RJ Skerry-Ryan
Eric Battenberg
Soroosh Mariooryad
Diederik P. Kingma
24
98
0
06 Nov 2020
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for
  End-to-end Speech Synthesis
Improving Prosody Modelling with Cross-Utterance BERT Embeddings for End-to-end Speech Synthesis
Guanghui Xu
Wei Song
Zhengchen Zhang
Chao Zhang
Xiaodong He
Bowen Zhou
18
50
0
06 Nov 2020
Semi-supervised URL Segmentation with Recurrent Neural Networks
  Pre-trained on Knowledge Graph Entities
Semi-supervised URL Segmentation with Recurrent Neural Networks Pre-trained on Knowledge Graph Entities
Hao Zhang
Jae Hun Ro
R. Sproat
6
13
0
05 Nov 2020
Prosodic Representation Learning and Contextual Sampling for Neural
  Text-to-Speech
Prosodic Representation Learning and Contextual Sampling for Neural Text-to-Speech
S. Karlapati
Ammar Abbas
Zack Hodari
Alexis Moinet
Arnaud Joly
Panagiota Karanasou
Thomas Drugman
28
19
0
04 Nov 2020
Incremental Machine Speech Chain Towards Enabling Listening while
  Speaking in Real-time
Incremental Machine Speech Chain Towards Enabling Listening while Speaking in Real-time
Sashi Novitasari
Andros Tjandra
Tomoya Yanagita
S. Sakti
Satoshi Nakamura
CLL
14
1
0
04 Nov 2020
Augmenting Images for ASR and TTS through Single-loop and Dual-loop
  Multimodal Chain Framework
Augmenting Images for ASR and TTS through Single-loop and Dual-loop Multimodal Chain Framework
Johanes Effendi
Andros Tjandra
S. Sakti
Satoshi Nakamura
19
3
0
04 Nov 2020
Learning Explicit Prosody Models and Deep Speaker Embeddings for
  Atypical Voice Conversion
Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion
Disong Wang
Songxiang Liu
Lifa Sun
Xixin Wu
Xunying Liu
Helen Meng
18
8
0
03 Nov 2020
FeatherTTS: Robust and Efficient attention based Neural TTS
FeatherTTS: Robust and Efficient attention based Neural TTS
Qiao Tian
Zewang Zhang
Chao-Jung Liu
Heng Lu
Linghui Chen
Bin Wei
P. He
Shan Liu
26
4
0
02 Nov 2020
The IQIYI System for Voice Conversion Challenge 2020
The IQIYI System for Voice Conversion Challenge 2020
Wendong Gan
Haitao Chen
Yin Yan
Jianwei Li
Bolong Wen
Xueping Xu
Hai Li
13
0
0
29 Oct 2020
DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device
  Text-to-Speech
DeviceTTS: A Small-Footprint, Fast, Stable Network for On-Device Text-to-Speech
Zhiying Huang
Hao Li
Ming Lei
14
11
0
29 Oct 2020
PPG-based singing voice conversion with adversarial representation
  learning
PPG-based singing voice conversion with adversarial representation learning
Zhonghao Li
Benlai Tang
Xiang Yin
Yuan Wan
Linjia Xu
Chen Shen
Zejun Ma
19
37
0
28 Oct 2020
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality
  Speech Synthesis
TTS-by-TTS: TTS-driven Data Augmentation for Fast and High-Quality Speech Synthesis
Min-Jae Hwang
Ryuichi Yamamoto
Eunwoo Song
Jae-Min Kim
26
32
0
26 Oct 2020
Emotion controllable speech synthesis using emotion-unlabeled dataset
  with the assistance of cross-domain speech emotion recognition
Emotion controllable speech synthesis using emotion-unlabeled dataset with the assistance of cross-domain speech emotion recognition
Xiong Cai
Dongyang Dai
Zhiyong Wu
Xiang Li
Jingbei Li
Helen Meng
14
66
0
26 Oct 2020
GSEP: A robust vocal and accompaniment separation system using gated
  CBHG module and loudness normalization
GSEP: A robust vocal and accompaniment separation system using gated CBHG module and loudness normalization
S. Park
Ben Sangbae Chon
19
2
0
23 Oct 2020
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
AISHELL-3: A Multi-speaker Mandarin TTS Corpus and the Baselines
Yao Shi
Hui Bu
Xin Xu
Shaojing Zhang
Ming Li
41
219
0
22 Oct 2020
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020
The NTU-AISG Text-to-speech System for Blizzard Challenge 2020
Haobo Zhang
Tingzhi Mao
Haihua Xu
Hao-Ming Huang
15
1
0
22 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
30
102
0
22 Oct 2020
NU-GAN: High resolution neural upsampling with GAN
NU-GAN: High resolution neural upsampling with GAN
Rithesh Kumar
Kundan Kumar
Vicki Anand
Yoshua Bengio
Aaron Courville
27
25
0
22 Oct 2020
An Investigation of the Relation Between Grapheme Embeddings and
  Pronunciation for Tacotron-based Systems
An Investigation of the Relation Between Grapheme Embeddings and Pronunciation for Tacotron-based Systems
Antoine Perquin
Erica Cooper
Junichi Yamagishi
14
1
0
21 Oct 2020
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with
  Self-adaptive Training
Fluent and Low-latency Simultaneous Speech-to-Speech Translation with Self-adaptive Training
Renjie Zheng
Mingbo Ma
Baigong Zheng
Kaibo Liu
Jiahong Yuan
Kenneth Church
Liang Huang
18
14
0
20 Oct 2020
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
End-to-End Text-to-Speech using Latent Duration based on VQ-VAE
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
31
16
0
19 Oct 2020
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on
  Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Towards Natural Bilingual and Code-Switched Speech Synthesis Based on Mix of Monolingual Recordings and Cross-Lingual Voice Conversion
Shengkui Zhao
Trung Hieu Nguyen
Hao Wang
B. Ma
18
25
0
16 Oct 2020
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis
  Including Unsupervised Duration Modeling
Non-Attentive Tacotron: Robust and Controllable Neural TTS Synthesis Including Unsupervised Duration Modeling
Jonathan Shen
Ye Jia
Mike Chrzanowski
Yu Zhang
Isaac Elias
Heiga Zen
Yonghui Wu
27
112
0
08 Oct 2020
JSSS: free Japanese speech corpus for summarization and simplification
JSSS: free Japanese speech corpus for summarization and simplification
Shinnosuke Takamichi
Mamoru Komachi
Naoko Tanji
Hiroshi Saruwatari
8
1
0
05 Oct 2020
Transfer Learning from Monolingual ASR to Transcription-free
  Cross-lingual Voice Conversion
Transfer Learning from Monolingual ASR to Transcription-free Cross-lingual Voice Conversion
Che-Jui Chang
20
5
0
30 Sep 2020
Transfer Learning from Speech Synthesis to Voice Conversion with
  Non-Parallel Training Data
Transfer Learning from Speech Synthesis to Voice Conversion with Non-Parallel Training Data
Mingyang Zhang
Yi Zhou
Li Zhao
Haizhou Li
24
53
0
30 Sep 2020
DiffWave: A Versatile Diffusion Model for Audio Synthesis
DiffWave: A Versatile Diffusion Model for Audio Synthesis
Zhifeng Kong
Ming-Yu Liu
Jiaji Huang
Kexin Zhao
Bryan Catanzaro
DiffM
BDL
36
1,402
0
21 Sep 2020
Hierarchical Multi-Grained Generative Model for Expressive Speech
  Synthesis
Hierarchical Multi-Grained Generative Model for Expressive Speech Synthesis
Yukiya Hono
Kazuna Tsuboi
Kei Sawada
Kei Hashimoto
Keiichiro Oura
Yoshihiko Nankaku
K. Tokuda
BDL
11
24
0
17 Sep 2020
Controllable neural text-to-speech synthesis using intuitive prosodic
  features
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
42
66
0
14 Sep 2020
Visual-speech Synthesis of Exaggerated Corrective Feedback
Visual-speech Synthesis of Exaggerated Corrective Feedback
Yaohua Bu
Weijun Li
Tianyi Ma
S. Chen
Jia Jia
Kun Li
Xiaobo Lu
16
1
0
12 Sep 2020
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence
  Modeling
Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling
Songxiang Liu
Yuewen Cao
Disong Wang
Xixin Wu
Xunying Liu
Helen Meng
BDL
29
88
0
06 Sep 2020
What the Future Brings: Investigating the Impact of Lookahead for
  Incremental Neural TTS
What the Future Brings: Investigating the Impact of Lookahead for Incremental Neural TTS
Brooke Stephenson
Laurent Besacier
Laurent Girin
Thomas Hueber
26
13
0
04 Sep 2020
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
HiFiSinger: Towards High-Fidelity Neural Singing Voice Synthesis
Jiawei Chen
Xu Tan
Jian Luan
Tao Qin
Tie-Yan Liu
VLM
21
92
0
03 Sep 2020
Voice Conversion by Cascading Automatic Speech Recognition and
  Text-to-Speech Synthesis with Prosody Transfer
Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer
Jing-Xuan Zhang
Li-Juan Liu
Yan-Nian Chen
Ya-Jun Hu
Yuan Jiang
Zhenhua Ling
Lirong Dai
19
17
0
03 Sep 2020
WaveGrad: Estimating Gradients for Waveform Generation
WaveGrad: Estimating Gradients for Waveform Generation
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
William Chan
DiffM
BDL
19
776
0
02 Sep 2020
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and
  cross-lingual voice conversion
Voice Conversion Challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion
Yi Zhao
Wen-Chin Huang
Xiaohai Tian
Junichi Yamagishi
Rohan Kumar Das
Tomi Kinnunen
Zhenhua Ling
Tomoki Toda
27
206
0
28 Aug 2020
Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning
Laughter Synthesis: Combining Seq2seq modeling with Transfer Learning
Noé Tits
Kevin El Haddad
Thierry Dutoit
17
14
0
20 Aug 2020
Unsupervised Acoustic Unit Representation Learning for Voice Conversion
  using WaveNet Auto-encoders
Unsupervised Acoustic Unit Representation Learning for Voice Conversion using WaveNet Auto-encoders
Mingjie Chen
Thomas Hain
SSL
DRL
19
15
0
16 Aug 2020
Audio Dequantization for High Fidelity Audio Generation in Flow-based
  Neural Vocoder
Audio Dequantization for High Fidelity Audio Generation in Flow-based Neural Vocoder
Hyun-Wook Yoon
Sang-Hoon Lee
Hyeong-Rae Noh
Seong-Whan Lee
20
11
0
16 Aug 2020
LSTM Acoustic Models Learn to Align and Pronounce with Graphemes
LSTM Acoustic Models Learn to Align and Pronounce with Graphemes
A. Datta
Guanlong Zhao
Bhuvana Ramabhadran
Eugene Weinstein
23
0
0
13 Aug 2020
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using
  Speaking Style Conversion
Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion
D. Paul
M. Shifas
Yannis Pantazis
Y. Stylianou
14
21
0
13 Aug 2020
Unsupervised Learning For Sequence-to-sequence Text-to-speech For
  Low-resource Languages
Unsupervised Learning For Sequence-to-sequence Text-to-speech For Low-resource Languages
Haitong Zhang
Yue Lin
6
30
0
11 Aug 2020
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial
  Training
Data Efficient Voice Cloning from Noisy Samples with Domain Adversarial Training
Jian Cong
Shan Yang
Lei Xie
Guoqiao Yu
Guanglu Wan
32
30
0
10 Aug 2020
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen
  Speaker and Recording Conditions
Speaker Conditional WaveRNN: Towards Universal Neural Vocoder for Unseen Speaker and Recording Conditions
D. Paul
Yannis Pantazis
Y. Stylianou
DRL
18
29
0
09 Aug 2020
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
LRSpeech: Extremely Low-Resource Speech Synthesis and Recognition
Jin Xu
Xu Tan
Yi Ren
Tao Qin
Jian Li
Sheng Zhao
Tie-Yan Liu
VLM
23
90
0
09 Aug 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
48
319
0
09 Aug 2020
Previous
123...111213...151617
Next