Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1803.09017
Cited By
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
23 March 2018
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"
44 / 194 papers shown
Title
Exploiting Deep Sentential Context for Expressive End-to-End Speech Synthesis
Fengyu Yang
Shan Yang
Qinghua Wu
Yujun Wang
Lei Xie
39
5
0
03 Aug 2020
Music FaderNets: Controllable Music Generation Based On High-Level Features via Low-Level Feature Modelling
Hao Hao Tan
Dorien Herremans
MGen
16
72
0
29 Jul 2020
Prosodic Prominence and Boundaries in Sequence-to-Sequence Speech Synthesis
Antti Suni
Sofoklis Kakouros
M. Vainio
J. Šimko
19
17
0
29 Jun 2020
Neural voice cloning with a few low-quality samples
Sunghee Jung
Hoi-Rim Kim
33
2
0
12 Jun 2020
Deep generative models for musical audio synthesis
M. Huzaifah
L. Wyse
27
20
0
10 Jun 2020
Glow-TTS: A Generative Flow for Text-to-Speech via Monotonic Alignment Search
Jaehyeon Kim
Sungwon Kim
Jungil Kong
Sungroh Yoon
54
477
0
22 May 2020
Pitchtron: Towards audiobook generation from ordinary people's voices
Sunghee Jung
Hoi-Rim Kim
16
5
0
21 May 2020
Investigation of learning abilities on linguistic features in sequence-to-sequence text-to-speech synthesis
Yusuke Yasuda
Xin Wang
Junichi Yamagishi
AI4TS
22
31
0
20 May 2020
Attentron: Few-Shot Text-to-Speech Utilizing Attention-Based Variable-Length Embedding
Seungwoo Choi
Seungju Han
Dongyoung Kim
S. Ha
37
65
0
18 May 2020
You Do Not Need More Data: Improving End-To-End Speech Recognition by Text-To-Speech Data Augmentation
A. Laptev
Roman Korostik
A. Svischev
A. Andrusenko
Ivan Medennikov
S. Rybin
16
61
0
14 May 2020
Flowtron: an Autoregressive Flow-based Generative Network for Text-to-Speech Synthesis
Rafael Valle
Kevin J. Shih
R. Prenger
Bryan Catanzaro
21
119
0
12 May 2020
From Speaker Verification to Multispeaker Speech Synthesis, Deep Transfer with Feedback Constraint
Zexin Cai
Chuxiong Zhang
Ming Li
24
41
0
10 May 2020
Jukebox: A Generative Model for Music
Prafulla Dhariwal
Heewoo Jun
Christine Payne
Jong Wook Kim
Alec Radford
Ilya Sutskever
VLM
55
724
0
30 Apr 2020
Unsupervised Style and Content Separation by Minimizing Mutual Information for Speech Synthesis
Ting-Yao Hu
A. Shrivastava
Oncel Tuzel
C. Dhir
11
30
0
09 Mar 2020
GraphTTS: graph-to-sequence modelling in neural text-to-speech
Aolan Sun
Jianzong Wang
Ning Cheng
Huayi Peng
Zhen Zeng
Jing Xiao
24
21
0
04 Mar 2020
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
16
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
36
92
0
06 Feb 2020
Generating Synthetic Audio Data for Attention-Based Speech Recognition Systems
Nick Rossenbach
Albert Zeyer
Ralf Schluter
Hermann Ney
18
83
0
19 Dec 2019
A unified sequence-to-sequence front-end model for Mandarin text-to-speech synthesis
Junjie Pan
Xiang Yin
Zhiling Zhang
Shichao Liu
Yang Zhang
Zejun Ma
Yuxuan Wang
9
26
0
11 Nov 2019
On Investigation of Unsupervised Speech Factorization Based on Normalization Flow
Haoran Sun
Yunqi Cai
Lantian Li
Dong Wang
21
1
0
29 Oct 2019
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
25
148
0
26 Oct 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
M. Whitehill
Shuang Ma
Daniel J. McDuff
Yale Song
25
35
0
25 Oct 2019
ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit
Tomoki Hayashi
Ryuichi Yamamoto
Katsuki Inoue
Takenori Yoshimura
Shinji Watanabe
T. Toda
K. Takeda
Yu Zhang
Xu Tan
VLM
29
202
0
24 Oct 2019
Attention Forcing for Sequence-to-sequence Model Training
Qingyun Dou
Yiting Lu
Joshua Efiong
Mark Gales
27
6
0
26 Sep 2019
Speech Recognition with Augmented Synthesized Speech
Andrew Rosenberg
Yu Zhang
Bhuvana Ramabhadran
Ye Jia
Pedro J. Moreno
Yonghui Wu
Zelin Wu
32
127
0
25 Sep 2019
DurIAN: Duration Informed Attention Network For Multimodal Synthesis
Chengzhu Yu
Heng Lu
Na Hu
Meng Yu
Chao Weng
...
Deyi Tuo
Shiyin Kang
Guangzhi Lei
Dan Su
Dong Yu
CVBM
19
118
0
04 Sep 2019
Maximizing Mutual Information for Tacotron
Peng Liu
Xixin Wu
Shiyin Kang
Guangzhi Li
Dan Su
Dong Yu
22
16
0
30 Aug 2019
Unpaired Image-to-Speech Synthesis with Multimodal Information Bottleneck
Shuang Ma
Daniel J. McDuff
Yale Song
25
22
0
19 Aug 2019
A Methodology for Controlling the Emotional Expressiveness in Synthetic Speech -- a Deep Learning approach
Noé Tits
16
10
0
05 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
Peng Wu
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Hong-Chuan Wu
Lirong Dai
10
72
0
26 Jun 2019
Learning Disentangled Representations of Timbre and Pitch for Musical Instrument Sounds Using Gaussian Mixture Variational Autoencoders
Yin-Jyun Luo
Kat R. Agres
Dorien Herremans
22
46
0
19 Jun 2019
Using generative modelling to produce varied intonation for speech synthesis
Zack Hodari
O. Watts
Simon King
29
29
0
10 Jun 2019
CHiVE: Varying Prosody in Speech Synthesis with a Linguistically Driven Dynamic Hierarchical Conditional Variational Network
V. Wan
Chun-an Chan
Tom Kenter
Jakub Vít
R. Clark
24
75
0
17 May 2019
Almost Unsupervised Text to Speech and Automatic Speech Recognition
Yi Ren
Xu Tan
Tao Qin
Sheng Zhao
Zhou Zhao
Tie-Yan Liu
44
101
0
13 May 2019
Incorporating Symbolic Sequential Modeling for Speech Enhancement
Chien-Feng Liao
Yu Tsao
Xugang Lu
Hisashi Kawai
27
18
0
30 Apr 2019
Direct speech-to-speech translation with a sequence-to-sequence model
Ye Jia
Ron J. Weiss
Fadi Biadsy
Wolfgang Macherey
Melvin Johnson
Zhehuai Chen
Yonghui Wu
21
223
0
12 Apr 2019
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
18
46
0
04 Apr 2019
Robust and fine-grained prosody control of end-to-end speech synthesis
Younggun Lee
Jonathan Le Roux
9
147
0
06 Nov 2018
Leveraging Weakly Supervised Data to Improve End-to-End Speech-to-Text Translation
Ye Jia
Melvin Johnson
Wolfgang Macherey
Ron J. Weiss
Yuan Cao
Chung-Cheng Chiu
Naveen Ari
Stella Laurenzo
Yonghui Wu
31
159
0
05 Nov 2018
Semi-Supervised Training for Improving Data Efficiency in End-to-End Speech Synthesis
Yu-An Chung
Yuxuan Wang
Wei-Ning Hsu
Yu Zhang
RJ Skerry-Ryan
24
117
0
30 Aug 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan
18
122
0
04 Aug 2018
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhehuai Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
207
820
0
12 Jun 2018
Conditional End-to-End Audio Transforms
Albert Haque
Michelle Guo
Prateek Verma
33
41
0
30 Mar 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
16
548
0
24 Mar 2018
Previous
1
2
3
4