ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Pretraining Techniques for Sequence-to-Sequence Voice Conversion
Wen-Chin Huang
Tomoki Hayashi
Yi-Chiao Wu
Hirokazu Kameoka
Tomoki Toda
27
38
0
07 Aug 2020
Peking Opera Synthesis via Duration Informed Attention Network
Peking Opera Synthesis via Duration Informed Attention Network
Yusong Wu
Shengchen Li
Chengzhu Yu
Heng Lu
Chao Weng
Liqiang Zhang
Dong Yu
16
10
0
07 Aug 2020
DurIAN-SC: Duration Informed Attention Network based Singing Voice
  Conversion System
DurIAN-SC: Duration Informed Attention Network based Singing Voice Conversion System
Liqiang Zhang
Chengzhu Yu
Heng Lu
Chao Weng
Chunlei Zhang
Yusong Wu
Xiang Xie
Zijin Li
Dong Yu
30
34
0
07 Aug 2020
HooliGAN: Robust, High Quality Neural Vocoding
HooliGAN: Robust, High Quality Neural Vocoding
Ollie McCarthy
Zo Ahmed
24
14
0
06 Aug 2020
PPSpeech: Phrase based Parallel End-to-End TTS System
PPSpeech: Phrase based Parallel End-to-End TTS System
Yahuan Cong
Ran Zhang
Jian Luan
34
3
0
06 Aug 2020
Recognition-Synthesis Based Non-Parallel Voice Conversion with
  Adversarial Learning
Recognition-Synthesis Based Non-Parallel Voice Conversion with Adversarial Learning
Jing-Xuan Zhang
Zhenhua Ling
Lirong Dai
15
6
0
05 Aug 2020
Audiovisual Speech Synthesis using Tacotron2
Audiovisual Speech Synthesis using Tacotron2
Ahmed Hussen Abdelaziz
Anushree Prasanna Kumar
Chloe Seivwright
Gabriele Fanelli
Justin Binder
Y. Stylianou
S. Kajarekar
20
15
0
03 Aug 2020
Exploiting Deep Sentential Context for Expressive End-to-End Speech
  Synthesis
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Fengyu Yang
Shan Yang
Qinghua Wu
Yujun Wang
Lei Xie
39
5
0
03 Aug 2020
Speaking Speed Control of End-to-End Speech Synthesis using
  Sentence-Level Conditioning
Speaking Speed Control of End-to-End Speech Synthesis using Sentence-Level Conditioning
Jaesung Bae
Hanbin Bae
Young-Sun Joo
Junmo Lee
Gyeong-Hoon Lee
Hoon-Young Cho
16
17
0
30 Jul 2020
Xiaomingbot: A Multilingual Robot News Reporter
Xiaomingbot: A Multilingual Robot News Reporter
Runxin Xu
Jun Cao
Mingxuan Wang
Jiaze Chen
Hao Zhou
...
Xiang Yin
Xijin Zhang
Songcheng Jiang
Yuxuan Wang
Lei Li
23
11
0
12 Jul 2020
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker
  Recognition to Overcome Data Scarcity
LSTM and GPT-2 Synthetic Speech Transfer Learning for Speaker Recognition to Overcome Data Scarcity
Jordan J. Bird
Diego Resende Faria
Anikó Ekárt
C. Premebida
Pedro P. S. Ayrosa
25
5
0
01 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech
  Synthesis
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis
Antti Suni
Sofoklis Kakouros
M. Vainio
J. Šimko
19
17
0
29 Jun 2020
Gamma Boltzmann Machine for Simultaneously Modeling Linear- and
  Log-amplitude Spectra
Gamma Boltzmann Machine for Simultaneously Modeling Linear- and Log-amplitude Spectra
Toru Nakashika
Kohei Yatabe
13
0
0
24 Jun 2020
Audeo: Audio Generation for a Silent Performance Video
Audeo: Audio Generation for a Silent Performance Video
Kun Su
Xiulong Liu
Eli Shlizerman
VGen
33
67
0
23 Jun 2020
Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory
  Inversion
Articulatory-WaveNet: Autoregressive Model For Acoustic-to-Articulatory Inversion
Narjes Bozorg
Michael T.Johnson
18
1
0
22 Jun 2020
Embodied Self-supervised Learning by Coordinated Sampling and Training
Embodied Self-supervised Learning by Coordinated Sampling and Training
Yifan Sun
Xihong Wu
SSL
30
7
0
20 Jun 2020
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement
Luka Chkhetiani
Levan Bejanidze
25
1
0
13 Jun 2020
Neural voice cloning with a few low-quality samples
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
33
2
0
12 Jun 2020
Deep generative models for musical audio synthesis
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
32
20
0
10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
29
109
0
08 Jun 2020
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
FastSpeech 2: Fast and High-Quality End-to-End Text to Speech
Yi Ren
Chenxu Hu
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
60
1,362
0
08 Jun 2020
End-to-End Adversarial Text-to-Speech
End-to-End Adversarial Text-to-Speech
Jeff Donahue
Sander Dieleman
Mikolaj Binkowski
Erich Elsen
Karen Simonyan
22
186
0
05 Jun 2020
PJS: phoneme-balanced Japanese singing voice corpus
PJS: phoneme-balanced Japanese singing voice corpus
Junya Koguchi
Shinnosuke Takamichi
20
22
0
04 Jun 2020
An ASR Guided Speech Intelligibility Measure for TTS Model Selection
An ASR Guided Speech Intelligibility Measure for TTS Model Selection
Arun Baby
Saranya Vinnaitherthan
Nagaraj Adiga
Pranav Jawale
Sumukh Badam
Sharath Adavanne
Srikanth Konjeti
9
7
0
02 Jun 2020
High-Fidelity Audio Generation and Representation Learning with Guided
  Adversarial Autoencoder
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
Kazi Nazmul Haque
R. Rana
Björn W Schuller
DRL
31
12
0
01 Jun 2020
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake
  Voices
DeepSonar: Towards Effective and Robust Detection of AI-Synthesized Fake Voices
Run Wang
Felix Juefei Xu
Yihao Huang
Qing Guo
Xiaofei Xie
Lei Ma
Yang Liu
AAML
32
105
0
28 May 2020
NAUTILUS: a Versatile Voice Cloning System
NAUTILUS: a Versatile Voice Cloning System
Hieu-Thi Luong
Junichi Yamagishi
28
51
0
22 May 2020
Conversational End-to-End TTS for Voice Agent
Conversational End-to-End TTS for Voice Agent
Haohan Guo
Shaofei Zhang
Frank Soong
Lei He
Lei Xie
34
67
0
21 May 2020
Investigation of learning abilities on linguistic features in
  sequence-to-sequence text-to-speech synthesis
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
AI4TS
22
31
0
20 May 2020
Improving Accent Conversion with Reference Encoder and End-To-End
  Text-To-Speech
Improving Accent Conversion with Reference Encoder and End-To-End Text-To-Speech
Wenjie Li
Benlai Tang
Xiang Yin
Yushi Zhao
Wei Li
Kang Wang
Hao Huang
Yuxuan Wang
Zejun Ma
14
13
0
19 May 2020
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks,
  and Cross-corpus Setting for Speech Emotion Recognition
Deep Architecture Enhancing Robustness to Noise, Adversarial Attacks, and Cross-corpus Setting for Speech Emotion Recognition
S. Latif
R. Rana
Sara Khalifa
Raja Jurdak
Björn W. Schuller
46
28
0
18 May 2020
Many-to-Many Voice Transformer Network
Many-to-Many Voice Transformer Network
Hirokazu Kameoka
Wen-Chin Huang
Kou Tanaka
Takuhiro Kaneko
Nobukatsu Hojo
Tomoki Toda
ViT
30
30
0
18 May 2020
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis
  Using Discrete Speech Representation
Semi-supervised Learning for Multi-speaker Text-to-speech Synthesis Using Discrete Speech Representation
Tao Tu
Yuan-Jui Chen
Alexander H. Liu
Hung-yi Lee
33
7
0
16 May 2020
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech
  without Explicit Alignment
JDI-T: Jointly trained Duration Informed Transformer for Text-To-Speech without Explicit Alignment
D. Lim
Won Jang
Gyeonghwan O
Heayoung Park
Bongwan Kim
Jaesam Yoon
27
36
0
15 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by
  Text-To-Speech Data Augmentation
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
28
119
0
12 May 2020
AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN
AdaDurIAN: Few-shot Adaptation for Neural Text-to-Speech with DurIAN
Zewang Zhang
Qiao Tian
Heng Lu
Ling-Hao Chen
Shan Liu
9
27
0
12 May 2020
DiscreTalk: Text-to-Speech as a Machine Translation Problem
DiscreTalk: Text-to-Speech as a Machine Translation Problem
Tomoki Hayashi
Shinji Watanabe
27
32
0
12 May 2020
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural
  Text-to-Speech
CopyCat: Many-to-Many Fine-Grained Prosody Transfer for Neural Text-to-Speech
S. Karlapati
Alexis Moinet
Arnaud Joly
V. Klimkov
Daniel Sáez-Trigueros
Thomas Drugman
19
67
0
30 Apr 2020
Adversarial Feature Learning and Unsupervised Clustering based Speech
  Synthesis for Found Data with Acoustic and Textual Noise
Adversarial Feature Learning and Unsupervised Clustering based Speech Synthesis for Found Data with Acoustic and Textual Noise
Shan Yang
Yuxuan Wang
Lei Xie
21
9
0
28 Apr 2020
ByteSing: A Chinese Singing Voice Synthesis System Using Duration
  Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
ByteSing: A Chinese Singing Voice Synthesis System Using Duration Allocated Encoder-Decoder Acoustic Models and WaveRNN Vocoders
Yu Gu
Xiang Yin
Yonghui Rao
Yuan Wan
Benlai Tang
Yang Zhang
Jitong Chen
Yuxuan Wang
Zejun Ma
28
70
0
23 Apr 2020
Utterance-level Sequential Modeling For Deep Gaussian Process Based
  Speech Synthesis Using Simple Recurrent Unit
Utterance-level Sequential Modeling For Deep Gaussian Process Based Speech Synthesis Using Simple Recurrent Unit
Tomoki Koriyama
Hiroshi Saruwatari
BDL
26
5
0
22 Apr 2020
Transformer based Grapheme-to-Phoneme Conversion
Transformer based Grapheme-to-Phoneme Conversion
Sevinj Yolchuyeva
Géza Németh
Bálint Gyires-Tóth
35
63
0
14 Apr 2020
Generating Multilingual Voices Using Speaker Space Translation Based on
  Bilingual Speaker Data
Generating Multilingual Voices Using Speaker Space Translation Based on Bilingual Speaker Data
Soumi Maiti
Erik Marchi
Alistair Conkie
19
17
0
10 Apr 2020
Advancing Speech Synthesis using EEG
Advancing Speech Synthesis using EEG
G. Krishna
Co Tran
Mason Carnahan
Ahmed H. Tewfik
27
11
0
09 Apr 2020
Emotional Video to Audio Transformation Using Deep Recurrent Neural
  Networks and a Neuro-Fuzzy System
Emotional Video to Audio Transformation Using Deep Recurrent Neural Networks and a Neuro-Fuzzy System
Gwenaelle Cunha Sergio
Minho Lee
9
8
0
05 Apr 2020
Caption Generation of Robot Behaviors based on Unsupervised Learning of
  Action Segments
Caption Generation of Robot Behaviors based on Unsupervised Learning of Action Segments
Koichiro Yoshino
Kohei Wakimoto
Yuta Nishimura
Satoshi Nakamura
14
8
0
23 Mar 2020
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit
  Alignment
AlignTTS: Efficient Feed-Forward Text-to-Speech System without Explicit Alignment
Zhen Zeng
Jianzong Wang
Ning Cheng
Tian Xia
Jing Xiao
VLM
33
56
0
04 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech
GraphTTS: graph-to-sequence modelling in neural text-to-speech
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Jing Xiao
24
21
0
04 Mar 2020
Semi-Supervised Neural Architecture Search
Semi-Supervised Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Enhong Chen
Tie-Yan Liu
13
88
0
24 Feb 2020
Previous
123...121314151617
Next