ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1703.10135
  4. Cited By
Tacotron: Towards End-to-End Speech Synthesis

Tacotron: Towards End-to-End Speech Synthesis

29 March 2017
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
Navdeep Jaitly
Zongheng Yang
Y. Xiao
Zhehuai Chen
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Tacotron: Towards End-to-End Speech Synthesis"

50 / 817 papers shown
Title
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
FakeAVCeleb: A Novel Audio-Video Multimodal Deepfake Dataset
Hasam Khalid
Shahroz Tariq
Minha Kim
Simon S. Woo
41
187
0
11 Aug 2021
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary
  Person
AnyoneNet: Synchronized Speech and Talking Head Generation for Arbitrary Person
Xinsheng Wang
Qicong Xie
Jihua Zhu
Lei Xie
O. Scharenborg
31
16
0
09 Aug 2021
SpecMix : A Mixed Sample Data Augmentation method for Training
  withTime-Frequency Domain Features
SpecMix : A Mixed Sample Data Augmentation method for Training withTime-Frequency Domain Features
Gwantae Kim
D. Han
Hanseok Ko
50
42
0
06 Aug 2021
An Empirical Study on End-to-End Singing Voice Synthesis with
  Encoder-Decoder Architectures
An Empirical Study on End-to-End Singing Voice Synthesis with Encoder-Decoder Architectures
Dengfeng Ke
Yuxing Lu
Xudong Liu
Yanyan Xu
Jing Sun
Cheng-Hao Cai
32
0
0
06 Aug 2021
Applying the Information Bottleneck Principle to Prosodic Representation
  Learning
Applying the Information Bottleneck Principle to Prosodic Representation Learning
Guangyan Zhang
Ying Qin
Daxin Tan
Tan Lee
45
4
0
05 Aug 2021
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive
  Speech Synthesis
Daft-Exprt: Cross-Speaker Prosody Transfer on Any Text for Expressive Speech Synthesis
Julian Zaïdi
Hugo Seuté
Benjamin van Niekerk
M. Carbonneau
34
20
0
04 Aug 2021
Information Sieve: Content Leakage Reduction in End-to-End Prosody For
  Expressive Speech Synthesis
Information Sieve: Content Leakage Reduction in End-to-End Prosody For Expressive Speech Synthesis
Xudong Dai
Cheng Gong
Longbiao Wang
Kaili Zhang
6
2
0
04 Aug 2021
Creation and Detection of German Voice Deepfakes
Creation and Detection of German Voice Deepfakes
Vanessa Barnekow
Dominik Binder
Niclas Kromrey
Pascal Munaretto
A. Schaad
Felix Schmieder
21
2
0
02 Aug 2021
End to End Bangla Speech Synthesis
End to End Bangla Speech Synthesis
Prithwiraj Bhattacharjee
Rajan Saha Raju
Arif Ahmad
M. S. Rahman
11
2
0
01 Aug 2021
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
A Survey on Audio Synthesis and Audio-Visual Multimodal Processing
Zhaofeng Shi
26
7
0
01 Aug 2021
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech
  Synthesis
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
25
22
0
27 Jul 2021
Beyond Voice Identity Conversion: Manipulating Voice Attributes by
  Adversarial Learning of Structured Disentangled Representations
Beyond Voice Identity Conversion: Manipulating Voice Attributes by Adversarial Learning of Structured Disentangled Representations
L. Benaroya
Nicolas Obin
Axel Roebel
16
5
0
26 Jul 2021
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal
  Latent Representations
Facetron: A Multi-speaker Face-to-Speech Model based on Cross-modal Latent Representations
Seyun Um
Jihyun Kim
Jihyun Lee
Hong-Goo Kang
CVBM
21
4
0
26 Jul 2021
Interactive Storytelling for Children: A Case-study of Design and
  Development Considerations for Ethical Conversational AI
Interactive Storytelling for Children: A Case-study of Design and Development Considerations for Ethical Conversational AI
J. Chubb
S. Missaoui
S. Concannon
Liam Maloney
James Alfred Walker
21
29
0
20 Jul 2021
Human Perception of Audio Deepfakes
Human Perception of Audio Deepfakes
Nicolas Müller
Karla Markert
Konstantin Böttinger
27
49
0
20 Jul 2021
Learning De-identified Representations of Prosody from Raw Audio
Learning De-identified Representations of Prosody from Raw Audio
J. Weston
R. Lenain
U. Meepegama
E. Fristed
SSL
37
15
0
17 Jul 2021
Direct speech-to-speech translation with discrete units
Direct speech-to-speech translation with discrete units
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
...
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
41
181
0
12 Jul 2021
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive
  Text-to-Speech Synthesis
VAENAR-TTS: Variational Auto-Encoder based Non-AutoRegressive Text-to-Speech Synthesis
Hui Lu
Zhiyong Wu
Xixin Wu
Xu Li
Shiyin Kang
Xunying Liu
Helen Meng
33
12
0
07 Jul 2021
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
AdaSpeech 3: Adaptive Text to Speech for Spontaneous Style
Yuzi Yan
Xu Tan
Bohan Li
Guangyan Zhang
Tao Qin
Sheng Zhao
Yuan-Chung Shen
Weiqiang Zhang
Tie-Yan Liu
17
21
0
06 Jul 2021
A Generative Model for Raw Audio Using Transformer Architectures
A Generative Model for Raw Audio Using Transformer Architectures
Prateek Verma
C. Chafe
32
28
0
30 Jun 2021
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Multi-Scale Spectrogram Modelling for Neural Text-to-Speech
Ammar Abbas
Bajibabu Bollepalli
Alexis Moinet
Arnaud Joly
Penny Karanasou
Peter Makarov
Simon Slangens
S. Karlapati
Thomas Drugman
26
0
0
29 Jun 2021
A Survey on Neural Speech Synthesis
A Survey on Neural Speech Synthesis
Xu Tan
Tao Qin
Frank Soong
Tie-Yan Liu
AI4TS
23
353
0
29 Jun 2021
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for
  Pronunciation Enhancement
N-Singer: A Non-Autoregressive Korean Singing Voice Synthesis System for Pronunciation Enhancement
Gyeong-Hoon Lee
Tae-Woo Kim
Hanbin Bae
Min-Ji Lee
Young-Ik Kim
Hoon-Young Cho
VLM
22
19
0
29 Jun 2021
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech
  Synthesis
GANSpeech: Adversarial Training for High-Fidelity Multi-Speaker Speech Synthesis
Jinhyeok Yang
Jaesung Bae
Taejun Bak
Young-Ik Kim
Hoon-Young Cho
34
36
0
29 Jun 2021
FastPitchFormant: Source-filter based Decomposed Modeling for Speech
  Synthesis
FastPitchFormant: Source-filter based Decomposed Modeling for Speech Synthesis
Taejun Bak
Jaesung Bae
Hanbin Bae
Young-Ik Kim
Hoon-Young Cho
34
16
0
29 Jun 2021
AI based Presentation Creator With Customized Audio Content Delivery
AI based Presentation Creator With Customized Audio Content Delivery
Muvazima Mansoor
Srikanth Chandar
Ramamoorthy Srinath
26
0
0
27 Jun 2021
UniTTS: Residual Learning of Unified Embedding Space for Speech Style
  Control
UniTTS: Residual Learning of Unified Embedding Space for Speech Style Control
M. Kang
Sungjae Kim
Injung Kim
26
3
0
21 Jun 2021
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational
  Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Glow-WaveGAN: Learning Speech Representations from GAN-based Variational Auto-Encoder For High Fidelity Flow-based Speech Synthesis
Jian Cong
Shan Yang
Lei Xie
Dan Su
DRL
18
29
0
21 Jun 2021
Controllable Context-aware Conversational Speech Synthesis
Controllable Context-aware Conversational Speech Synthesis
Jian Cong
Shan Yang
Na Hu
Guangzhi Li
Lei Xie
Dan Su
25
30
0
21 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
49
9
0
18 Jun 2021
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
WaveGrad 2: Iterative Refinement for Text-to-Speech Synthesis
Nanxin Chen
Yu Zhang
Heiga Zen
Ron J. Weiss
Mohammad Norouzi
Najim Dehak
William Chan
DiffM
23
88
0
17 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional
  Text-to-Speech Model
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model
Chenye Cui
Yi Ren
Jinglin Liu
Feiyang Chen
Rongjie Huang
Ming Lei
Zhou Zhao
24
35
0
17 Jun 2021
Enriching Source Style Transfer in Recognition-Synthesis based
  Non-Parallel Voice Conversion
Enriching Source Style Transfer in Recognition-Synthesis based Non-Parallel Voice Conversion
Zhichao Wang
Xinyong Zhou
Fengyu Yang
Tao Li
Hongqiang Du
Lei Xie
Wendong Gan
Haitao Chen
Hai Li
32
22
0
16 Jun 2021
RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
RyanSpeech: A Corpus for Conversational Text-to-Speech Synthesis
Rohola Zandie
Mohammad H. Mahoor
Julia Madsen
Eshrat S. Emamian
38
25
0
15 Jun 2021
A learned conditional prior for the VAE acoustic space of a TTS system
A learned conditional prior for the VAE acoustic space of a TTS system
Panagiota Karanasou
S. Karlapati
Alexis Moinet
Arnaud Joly
Ammar Abbas
Simon Slangen
Jaime Lorenzo-Trueba
Thomas Drugman
40
7
0
14 Jun 2021
Continuous Wavelet Vocoder-based Decomposition of Parametric Speech
  Waveform Synthesis
Continuous Wavelet Vocoder-based Decomposition of Parametric Speech Waveform Synthesis
M. S. Al-Radhi
Tamás Gábor Csapó
Csaba Zainkó
Géza Németh
21
3
0
12 Jun 2021
HUI-Audio-Corpus-German: A high quality TTS dataset
HUI-Audio-Corpus-German: A high quality TTS dataset
Pascal Puchtler
Johannes Wirth
René Peinl
14
21
0
11 Jun 2021
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis
  with Graph-based Multi-modal Context Modeling
Enhancing Speaking Styles in Conversational Text-to-Speech Synthesis with Graph-based Multi-modal Context Modeling
Jingbei Li
Yi Meng
Chenyi Li
Zhiyong Wu
Helen Meng
Chao Weng
Dan Su
33
24
0
11 Jun 2021
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
Sprachsynthese -- State-of-the-Art in englischer und deutscher Sprache
René Peinl
35
0
0
11 Jun 2021
Improving multi-speaker TTS prosody variance with a residual encoder and
  normalizing flows
Improving multi-speaker TTS prosody variance with a residual encoder and normalizing flows
Iván Vallés-Pérez
Julian Roth
Grzegorz Beringer
Roberto Barra-Chicote
J. Droppo
39
8
0
10 Jun 2021
Speech BERT Embedding For Improving Prosody in Neural TTS
Speech BERT Embedding For Improving Prosody in Neural TTS
Liping Chen
Yan Deng
Xi Wang
Frank Soong
Lei He
25
22
0
08 Jun 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
25
160
0
06 Jun 2021
An objective evaluation of the effects of recording conditions and
  speaker characteristics in multi-speaker deep neural speech synthesis
An objective evaluation of the effects of recording conditions and speaker characteristics in multi-speaker deep neural speech synthesis
Beáta Lőrincz
Adriana Stan
M. Giurgiu
29
2
0
03 Jun 2021
Speaker verification-derived loss and data augmentation for DNN-based
  multispeaker speech synthesis
Speaker verification-derived loss and data augmentation for DNN-based multispeaker speech synthesis
Beáta Lőrincz
Adriana Stan
M. Giurgiu
29
6
0
03 Jun 2021
Recent Advances and Trends in Multimodal Deep Learning: A Review
Recent Advances and Trends in Multimodal Deep Learning: A Review
Jabeen Summaira
Xi Li
Amin Muhammad Shoib
Songyuan Li
Abdul Jabbar
HAI
28
55
0
24 May 2021
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All
  You Need For Audio Generation
ItôTTS and ItôWave: Linear Stochastic Differential Equation Is All You Need For Audio Generation
Shoule Wu
Ziqiang Shi
DiffM
27
11
0
17 May 2021
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
MASS: Multi-task Anthropomorphic Speech Synthesis Framework
Jinyin Chen
Linhui Ye
Zhaoyan Ming
23
6
0
10 May 2021
Exploring emotional prototypes in a high dimensional TTS latent space
Exploring emotional prototypes in a high dimensional TTS latent space
Pol van Rijn
Silvan Mertes
Dominik Schiller
Peter M. C. Harrison
P. Larrouy-Maestri
Elisabeth André
Nori Jacoby
28
12
0
05 May 2021
End-to-End Video-To-Speech Synthesis using Generative Adversarial
  Networks
End-to-End Video-To-Speech Synthesis using Generative Adversarial Networks
Rodrigo Mira
Konstantinos Vougioukas
Pingchuan Ma
Stavros Petridis
Björn W. Schuller
Maja Pantic
41
43
0
27 Apr 2021
Phrase break prediction with bidirectional encoder representations in
  Japanese text-to-speech synthesis
Phrase break prediction with bidirectional encoder representations in Japanese text-to-speech synthesis
Kosuke Futamata
Byeong-Cheol Park
Ryuichi Yamamoto
Kentaro Tachibana
22
14
0
26 Apr 2021
Previous
123...91011...151617
Next