ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.09047
  4. Cited By
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
  Tacotron

Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron

24 March 2018
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
ArXivPDFHTML

Papers citing "Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron"

39 / 139 papers shown
Title
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech
  Synthesis
GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis
Rui Liu
Berrak Sisman
Haizhou Li
18
24
0
23 Oct 2020
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Parallel Tacotron: Non-Autoregressive and Controllable TTS
Isaac Elias
Heiga Zen
Jonathan Shen
Yu Zhang
Ye Jia
Ron J. Weiss
Yonghui Wu
DRL
30
102
0
22 Oct 2020
Replacing Human Audio with Synthetic Audio for On-device Unspoken
  Punctuation Prediction
Replacing Human Audio with Synthetic Audio for On-device Unspoken Punctuation Prediction
Daria Soboleva
Ondrej Skopek
Márius vSajgalík
Victor Cuarbune
Felix Weissenberger
...
B. Prisacari
Daniel Valcarce
Justin Lu
Rohit Prabhavalkar
Balint Miklos
39
9
0
20 Oct 2020
Controllable neural text-to-speech synthesis using intuitive prosodic
  features
Controllable neural text-to-speech synthesis using intuitive prosodic features
T. Raitio
Ramya Rasipuram
D. Castellani
42
66
0
14 Sep 2020
An Overview of Voice Conversion and its Challenges: From Statistical
  Modeling to Deep Learning
An Overview of Voice Conversion and its Challenges: From Statistical Modeling to Deep Learning
Berrak Sisman
Junichi Yamagishi
Simon King
Haizhou Li
BDL
43
318
0
09 Aug 2020
Expressive TTS Training with Frame and Style Reconstruction Loss
Expressive TTS Training with Frame and Style Reconstruction Loss
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
42
73
0
04 Aug 2020
Audiovisual Speech Synthesis using Tacotron2
Audiovisual Speech Synthesis using Tacotron2
Ahmed Hussen Abdelaziz
Anushree Prasanna Kumar
Chloe Seivwright
Gabriele Fanelli
Justin Binder
Y. Stylianou
S. Kajarekar
20
15
0
03 Aug 2020
Deep generative models for musical audio synthesis
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
27
20
0
10 Jun 2020
MultiSpeech: Multi-Speaker Text to Speech with Transformer
MultiSpeech: Multi-Speaker Text to Speech with Transformer
Mingjian Chen
Xu Tan
Yi Ren
Jin Xu
Hao Sun
Sheng Zhao
Tao Qin
Tie-Yan Liu
26
109
0
08 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment
  Search
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
54
478
0
22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
16
5
0
21 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based
  Variable-Length Embedding
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
37
66
0
18 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for
  Text-to-Speech Synthesis
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
23
119
0
12 May 2020
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice
  Conversion without Parallel Data
Cotatron: Transcription-Guided Speech Encoder for Any-to-Many Voice Conversion without Parallel Data
Seung-won Park
Doo-young Kim
Myun-chul Joe
29
40
0
07 May 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech
GraphTTS: graph-to-sequence modelling in neural text-to-speech
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Jing Xiao
24
21
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
16
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Emotional Voice Conversion using Multitask Learning with Text-to-speech
Tae-Ho Kim
Sungjae Cho
Shinkook Choi
Sejik Park
Soo-Young Lee
27
37
0
11 Nov 2019
A unified sequence-to-sequence front-end model for Mandarin
  text-to-speech synthesis
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Junjie Pan
Xiang Yin
Zhiling Zhang
Shichao Liu
Yang Zhang
Zejun Ma
Yuxuan Wang
14
26
0
11 Nov 2019
Teacher-Student Training for Robust Tacotron-based TTS
Teacher-Student Training for Robust Tacotron-based TTS
Rui Liu
Berrak Sisman
Jingdong Li
F. Bao
Guanglai Gao
Haizhou Li
19
38
0
07 Nov 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
25
148
0
26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle
  Consistency
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
M. Whitehill
Shuang Ma
Daniel J. McDuff
Yale Song
28
35
0
25 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source
  End-to-End Text-to-Speech Toolkit
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
T. Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
29
202
0
24 Oct 2019
Location-Relative Attention Mechanisms For Robust Long-Form Speech
  Synthesis
Location-Relative Attention Mechanisms For Robust Long-Form Speech Synthesis
Eric Battenberg
RJ Skerry-Ryan
Soroosh Mariooryad
Daisy Stanton
David Kao
Matt Shannon
Tom Bagby
33
113
0
23 Oct 2019
Sequence to Sequence Neural Speech Synthesis with Prosody Modification
  Capabilities
Sequence to Sequence Neural Speech Synthesis with Prosody Modification Capabilities
Slava Shechtman
A. Sorin
22
33
0
23 Sep 2019
A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression
  Capabilities
A High-Fidelity Open Embodied Avatar with Lip Syncing and Expression Capabilities
Deepali Aneja
Daniel J. McDuff
S. Shah
14
38
0
19 Sep 2019
Maximizing Mutual Information for Tacotron
Maximizing Mutual Information for Tacotron
Peng Liu
Xixin Wu
Shiyin Kang
Guangzhi Li
Dan Su
Dong Yu
22
16
0
30 Aug 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic
  Speech -- a Deep Learning approach
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
16
10
0
05 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and
  Semi-Supervised Training
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
Peng Wu
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Hong-Chuan Wu
Lirong Dai
10
72
0
26 Jun 2019
ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent
  Generative Adversarial Networks
ET-GAN: Cross-Language Emotion Transfer Based on Cycle-Consistent Generative Adversarial Networks
Xiaoqi Jia
Jianwei Tai
Hang Zhou
Yakai Li
Weijuan Zhang
Haichao Du
Qingjia Huang
GAN
22
6
0
27 May 2019
Building a mixed-lingual neural TTS system with only monolingual data
Building a mixed-lingual neural TTS system with only monolingual data
Liumeng Xue
Wei Song
Guanghui Xu
Lei Xie
Zhizheng Wu
17
30
0
12 Apr 2019
Multi-reference Tacotron by Intercross Training for Style
  Disentangling,Transfer and Control in Speech Synthesis
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
18
46
0
04 Apr 2019
Style Transfer and Extraction for the Handwritten Letters Using Deep
  Learning
Style Transfer and Extraction for the Handwritten Letters Using Deep Learning
Omar Mohammed
Gérard Bailly
D. Pellier
12
1
0
10 Dec 2018
Robust and fine-grained prosody control of end-to-end speech synthesis
Robust and fine-grained prosody control of end-to-end speech synthesis
Younggun Lee
Jonathan Le Roux
11
147
0
06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text
  Translation
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
31
159
0
05 Nov 2018
Sample Efficient Adaptive Text-to-Speech
Sample Efficient Adaptive Text-to-Speech
Yutian Chen
Yannis Assael
Brendan Shillingford
David Budden
Scott E. Reed
...
Ben Laurie
Çağlar Gülçehre
Aaron van den Oord
Oriol Vinyals
Nando de Freitas
35
149
0
27 Sep 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech
  Synthesis
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan
18
122
0
04 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhehuai Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
207
821
0
12 Jun 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
38
815
0
23 Mar 2018
Previous
123