ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Text-to-Speech Synthesis Techniques for MIDI-to-Audio Synthesis
Erica Cooper
Xin Wang
Junichi Yamagishi
39
6
0
25 Apr 2021
Review of end-to-end speech synthesis technology based on deep learning
Review of end-to-end speech synthesis technology based on deep learning
Zhaoxi Mu
Xinyu Yang
Yizhuo Dong
AuLLM
ALM
26
24
0
20 Apr 2021
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
AdaSpeech 2: Adaptive Text to Speech with Untranscribed Data
Yuzi Yan
Xu Tan
Bohan Li
Tao Qin
Sheng Zhao
Yuan-Chung Shen
Tie-Yan Liu
20
45
0
20 Apr 2021
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
KazakhTTS: An Open-Source Kazakh Text-to-Speech Synthesis Dataset
Saida Mussakhojayeva
Aigerim Janaliyeva
A. Mirzakhmetov
Yerbolat Khassanov
H. A. Varol
17
14
0
17 Apr 2021
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model
  for Speech Synthesis with Explicit Pitch and Duration Prediction
TalkNet 2: Non-Autoregressive Depth-Wise Separable Convolutional Model for Speech Synthesis with Explicit Pitch and Duration Prediction
Stanislav Beliaev
Boris Ginsburg
27
8
0
16 Apr 2021
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice
  Conversion
FastS2S-VC: Streaming Non-Autoregressive Sequence-to-Sequence Voice Conversion
Hirokazu Kameoka
Kou Tanaka
Takuhiro Kaneko
39
21
0
14 Apr 2021
Enhancing Word-Level Semantic Representation via Dependency Structure
  for Expressive Text-to-Speech Synthesis
Enhancing Word-Level Semantic Representation via Dependency Structure for Expressive Text-to-Speech Synthesis
Yixuan Zhou
Changhe Song
Jingbei Li
Zhiyong Wu
Yanyao Bian
Dan Su
Helen Meng
46
6
0
14 Apr 2021
Non-autoregressive sequence-to-sequence voice conversion
Non-autoregressive sequence-to-sequence voice conversion
Tomoki Hayashi
Wen-Chin Huang
Kazuhiro Kobayashi
Tomoki Toda
14
23
0
14 Apr 2021
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Generalized Spoofing Detection Inspired from Audio Generation Artifacts
Yang Gao
Tyler Vuong
Mahsa Elyasi
Gaurav Bharaj
Rita Singh
26
20
0
08 Apr 2021
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Flavored Tacotron: Conditional Learning for Prosodic-linguistic Features
Mahsa Elyasi
Gaurav Bharaj
19
2
0
08 Apr 2021
Phoneme-based Distribution Regularization for Speech Enhancement
Phoneme-based Distribution Regularization for Speech Enhancement
Yajing Liu
Xiulian Peng
Zhiwei Xiong
Yan Lu
10
4
0
08 Apr 2021
Half-Truth: A Partially Fake Audio Detection Dataset
Half-Truth: A Partially Fake Audio Detection Dataset
Jiangyan Yi
Ye Bai
J. Tao
Haoxin Ma
Zhengkun Tian
Chenglong Wang
Tao Wang
Ruibo Fu
21
82
0
08 Apr 2021
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Towards Multi-Scale Style Control for Expressive Speech Synthesis
Xiang Li
Changhe Song
Jingbei Li
Zhiyong Wu
Jia Jia
Helen Meng
25
47
0
08 Apr 2021
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Diff-TTS: A Denoising Diffusion Model for Text-to-Speech
Myeonghun Jeong
Hyeongju Kim
Sung Jun Cheon
Byoung Jin Choi
N. Kim
DiffM
25
191
0
03 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with
  Improved Emotion Discriminability
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability
Rui Liu
Berrak Sisman
Haizhou Li
39
32
0
03 Apr 2021
Attention Forcing for Machine Translation
Attention Forcing for Machine Translation
Qingyun Dou
Yiting Lu
Potsawee Manakul
Xixin Wu
Mark Gales
33
7
0
02 Apr 2021
Multi-rate attention architecture for fast streamable Text-to-speech
  spectrum modeling
Multi-rate attention architecture for fast streamable Text-to-speech spectrum modeling
Qing He
Zhiping Xiu
T. Koehler
Jilong Wu
24
7
0
01 Apr 2021
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
Fast DCTTS: Efficient Deep Convolutional Text-to-Speech
M. Kang
Jihyun Lee
Simin Kim
Injung Kim
8
6
0
01 Apr 2021
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech:
  Two-stage Sequence-to-Sequence Training
Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training
Kun Zhou
Berrak Sisman
Haizhou Li
28
27
0
31 Mar 2021
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
PnG BERT: Augmented BERT on Phonemes and Graphemes for Neural TTS
Ye Jia
Heiga Zen
Jonathan Shen
Yu Zhang
Yonghui Wu
SSL
52
81
0
28 Mar 2021
Continual Speaker Adaptation for Text-to-Speech Synthesis
Continual Speaker Adaptation for Text-to-Speech Synthesis
Hamed Hemati
Damian Borth
CLL
27
9
0
26 Mar 2021
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech
  Decomposition for Expressive and Controllable Neural Text to Speech
STYLER: Style Factor Modeling with Rapidity and Robustness via Speech Decomposition for Expressive and Controllable Neural Text to Speech
Keon Lee
Kyumin Park
Daeyoung Kim
24
30
0
17 Mar 2021
Analysis and Assessment of Controllability of an Expressive Deep
  Learning-based TTS system
Analysis and Assessment of Controllability of an Expressive Deep Learning-based TTS system
Noé Tits
Kevin El Haddad
Thierry Dutoit
24
5
0
06 Mar 2021
AdaSpeech: Adaptive Text to Speech for Custom Voice
AdaSpeech: Adaptive Text to Speech for Custom Voice
Mingjian Chen
Xu Tan
Bohan Li
Yanqing Liu
Tao Qin
Sheng Zhao
Tie-Yan Liu
VLM
DiffM
42
188
0
01 Mar 2021
Deepfakes Generation and Detection: State-of-the-art, open challenges,
  countermeasures, and way forward
Deepfakes Generation and Detection: State-of-the-art, open challenges, countermeasures, and way forward
Momina Masood
M. Nawaz
K. Malik
A. Javed
Aun Irtaza
AAML
128
299
0
25 Feb 2021
AudioVisual Speech Synthesis: A brief literature review
AudioVisual Speech Synthesis: A brief literature review
Efthymios Georgiou
Athanasios Katsamanis
21
0
0
18 Feb 2021
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep
  VAE with Residual Attention
VARA-TTS: Non-Autoregressive Text-to-Speech Synthesis based on Very Deep VAE with Residual Attention
Peng Liu
Yuewen Cao
Songxiang Liu
Na Hu
Guangzhi Li
Chao Weng
Dan Su
42
22
0
12 Feb 2021
Onoma-to-wave: Environmental sound synthesis from onomatopoeic words
Onoma-to-wave: Environmental sound synthesis from onomatopoeic words
Yuki Okamoto
Keisuke Imoto
Shinnosuke Takamichi
Ryosuke Yamanishi
Takahiro Fukumori
Y. Yamashita
13
14
0
11 Feb 2021
LightSpeech: Lightweight and Fast Text to Speech with Neural
  Architecture Search
LightSpeech: Lightweight and Fast Text to Speech with Neural Architecture Search
Renqian Luo
Xu Tan
Rui Wang
Tao Qin
Jinzhu Li
Sheng Zhao
Enhong Chen
Tie-Yan Liu
22
58
0
08 Feb 2021
Rich Prosody Diversity Modelling with Phone-level Mixture Density
  Network
Rich Prosody Diversity Modelling with Phone-level Mixture Density Network
Chenpeng Du
K. Yu
36
17
0
01 Feb 2021
Triple M: A Practical Text-to-speech Synthesis System With
  Multi-guidance Attention And Multi-band Multi-time LPCNet
Triple M: A Practical Text-to-speech Synthesis System With Multi-guidance Attention And Multi-band Multi-time LPCNet
Shilu Lin
Fenglong Xie
Li Meng
Xinhui Li
Li Lu
11
0
0
30 Jan 2021
Expressive Neural Voice Cloning
Expressive Neural Voice Cloning
Paarth Neekhara
Shehzeen Samarah Hussain
Shlomo Dubnov
F. Koushanfar
Julian McAuley
DiffM
35
30
0
30 Jan 2021
High-Quality Vocoding Design with Signal Processing for Speech Synthesis
  and Voice Conversion
High-Quality Vocoding Design with Signal Processing for Speech Synthesis and Voice Conversion
M. S. Al-Radhi
16
1
0
25 Jan 2021
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram
  loss
Improved parallel WaveGAN vocoder with perceptually weighted spectrogram loss
Eunwoo Song
Ryuichi Yamamoto
Min-Jae Hwang
Jin-Seob Kim
Ohsung Kwon
Jae-Min Kim
19
14
0
19 Jan 2021
Whispered and Lombard Neural Speech Synthesis
Whispered and Lombard Neural Speech Synthesis
Qiong Hu
T. Bleisch
Petko N. Petkov
T. Raitio
Erik Marchi
V. Lakshminarasimhan
12
14
0
13 Jan 2021
Fake Visual Content Detection Using Two-Stream Convolutional Neural
  Networks
Fake Visual Content Detection Using Two-Stream Convolutional Neural Networks
B. Yousaf
Muhammad Usama
Waqas Sultani
Arif Mahmood
Junaid Qadir
25
8
0
03 Jan 2021
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Text-Free Image-to-Speech Synthesis Using Learned Segmental Units
Wei-Ning Hsu
David Harwath
Christopher Song
James R. Glass
CLIP
37
66
0
31 Dec 2020
Unified Mandarin TTS Front-end Based on Distilled BERT Model
Unified Mandarin TTS Front-end Based on Distilled BERT Model
Yang Zhang
Liqun Deng
Yasheng Wang
21
24
0
31 Dec 2020
Building Multi lingual TTS using Cross Lingual Voice Conversion
Building Multi lingual TTS using Cross Lingual Voice Conversion
Qinghua Sun
Kenji Nagamatsu
6
3
0
28 Dec 2020
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large
  Pretrained Language Model
Incremental Text-to-Speech Synthesis Using Pseudo Lookahead with Large Pretrained Language Model
Takaaki Saeki
Shinnosuke Takamichi
Hiroshi Saruwatari
8
16
0
23 Dec 2020
Syntactic representation learning for neural network based TTS with
  syntactic parse tree traversal
Syntactic representation learning for neural network based TTS with syntactic parse tree traversal
Changhe Song
Jingbei Li
Yixuan Zhou
Zhiyong Wu
Helen Meng
30
6
0
13 Dec 2020
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at
  Pitch
I'm Sorry for Your Loss: Spectrally-Based Audio Distances Are Bad at Pitch
Joseph P. Turian
Max Henry
24
29
0
08 Dec 2020
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture
Chenfeng Miao
Shuang Liang
Zhencheng Liu
Minchuan Chen
Jun Ma
Shaojun Wang
Jing Xiao
22
38
0
07 Dec 2020
MelGlow: Efficient Waveform Generative Network Based on
  Location-Variable Convolution
MelGlow: Efficient Waveform Generative Network Based on Location-Variable Convolution
Zhen Zeng
Jianzong Wang
Ning Cheng
Jing Xiao
22
8
0
03 Dec 2020
GraphPB: Graphical Representations of Prosody Boundary in Speech
  Synthesis
GraphPB: Graphical Representations of Prosody Boundary in Speech Synthesis
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Lingwei Kong
Jing Xiao
16
9
0
03 Dec 2020
FBWave: Efficient and Scalable Neural Vocoders for Streaming
  Text-To-Speech on the Edge
FBWave: Efficient and Scalable Neural Vocoders for Streaming Text-To-Speech on the Edge
Bichen Wu
Qing He
Peizhao Zhang
T. Koehler
Kurt Keutzer
Peter Vajda
31
6
0
25 Nov 2020
Controllable Emotion Transfer For End-to-End Speech Synthesis
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
28
73
0
17 Nov 2020
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Accent and Speaker Disentanglement in Many-to-many Voice Conversion
Zhichao Wang
Wenshuo Ge
Xiong Wang
Shan Yang
Wendong Gan
Haitao Chen
Hai Li
Lei Xie
Xiulin Li
CVBM
41
32
0
17 Nov 2020
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
s-Transformer: Segment-Transformer for Robust Neural Speech Synthesis
Xi Wang
Huaiping Ming
Lei He
Frank Soong
19
5
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for
  Emotional Speech Synthesis
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
27
55
0
17 Nov 2020
Previous
123...101112...151617
Next