ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1803.09017
  4. Cited By
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

23 March 2018
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
ArXivPDFHTML

Papers citing "Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis"

50 / 195 papers shown
Title
Emotion Selectable End-to-End Text-based Speech Editing
Emotion Selectable End-to-End Text-based Speech Editing
Tao Wang
Jiangyan Yi
Ruibo Fu
J. Tao
Zhengqi Wen
Chu Yuan Zhang
33
2
0
20 Dec 2022
Contextual Expressive Text-to-Speech
Contextual Expressive Text-to-Speech
Jianhong Tu
Zeyu Cui
Xiaohuan Zhou
Siqi Zheng
Kaiqin Hu
Ju Fan
Chang Zhou
22
2
0
26 Nov 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions
PromptTTS: Controllable Text-to-Speech with Text Descriptions
Zhifang Guo
Yichong Leng
Yihan Wu
Sheng Zhao
Xuejiao Tan
DiffM
27
91
0
22 Nov 2022
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection
Robust Vocal Quality Feature Embeddings for Dysphonic Voice Detection
Jianwei Zhang
J. Liss
Suren Jayasuriya
Visar Berisha
41
6
0
17 Nov 2022
OverFlow: Putting flows on top of neural transducers for better TTS
OverFlow: Putting flows on top of neural transducers for better TTS
Shivam Mehta
Ambika Kirkland
Harm Lameris
Jonas Beskow
Éva Székely
G. Henter
AI4TS
39
12
0
13 Nov 2022
Accented Text-to-Speech Synthesis with a Conditional Variational
  Autoencoder
Accented Text-to-Speech Synthesis with a Conditional Variational Autoencoder
J. Melechovský
Ambuj Mehrish
Berrak Sisman
Dorien Herremans
28
6
0
07 Nov 2022
Deliberation Networks and How to Train Them
Deliberation Networks and How to Train Them
Qingyun Dou
Mark Gales
24
0
0
06 Nov 2022
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for
  Noise-robust Expressive TTS
NoreSpeech: Knowledge Distillation based Conditional Diffusion Model for Noise-robust Expressive TTS
Dongchao Yang
Songxiang Liu
Jianwei Yu
Helin Wang
Chao Weng
Yuexian Zou
DiffM
VLM
43
18
0
04 Nov 2022
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New
  Speakers
Adapter-Based Extension of Multi-Speaker Text-to-Speech Model for New Speakers
Cheng-Ping Hsieh
Subhankar Ghosh
Boris Ginsburg
43
18
0
01 Nov 2022
Learning utterance-level representations through token-level acoustic
  latents prediction for Expressive Speech Synthesis
Learning utterance-level representations through token-level acoustic latents prediction for Expressive Speech Synthesis
Karolos Nikitaras
Konstantinos Klapsas
Nikolaos Ellinas
Georgia Maniati
June Sig Sung
Inchul Hwang
S. Raptis
Aimilios Chalamandaris
Pirros Tsiakoulis
19
0
0
01 Nov 2022
Combining Automatic Speaker Verification and Prosody Analysis for
  Synthetic Speech Detection
Combining Automatic Speaker Verification and Prosody Analysis for Synthetic Speech Detection
L. Attorresi
Davide Salvi
Clara Borrelli
Paolo Bestagini
Stefano Tubaro
21
22
0
31 Oct 2022
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR
  Challenge
The NPU-ASLP System for The ISCSLP 2022 Magichub Code-Swiching ASR Challenge
Yuhao Liang
Pei-Ning Chen
F. Yu
Xinfa Zhu
Tianyi Xu
Linfu Xie
33
0
0
26 Oct 2022
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging
  Talking Avatar
Pre-Avatar: An Automatic Presentation Generation Framework Leveraging Talking Avatar
Aolan Sun
Xulong Zhang
Tiandong Ling
Jianzong Wang
Ning Cheng
Jing Xiao
35
4
0
13 Oct 2022
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data
  for Zero-Shot Multi-Speaker Text-to-Speech
Adversarial Speaker-Consistency Learning Using Untranscribed Speech Data for Zero-Shot Multi-Speaker Text-to-Speech
Byoung Jin Choi
Myeonghun Jeong
Minchan Kim
Sung Hwan Mun
N. Kim
DiffM
27
5
0
12 Oct 2022
An Overview of Affective Speech Synthesis and Conversion in the Deep
  Learning Era
An Overview of Affective Speech Synthesis and Conversion in the Deep Learning Era
Andreas Triantafyllopoulos
Björn W. Schuller
Gokcce .Iymen
M. Sezgin
Xiangheng He
...
Shuo Liu
Silvan Mertes
Elisabeth André
Ruibo Fu
Jianhua Tao
20
53
0
06 Oct 2022
Controllable Accented Text-to-Speech Synthesis
Controllable Accented Text-to-Speech Synthesis
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
42
6
0
22 Sep 2022
AutoLV: Automatic Lecture Video Generator
AutoLV: Automatic Lecture Video Generator
Wen Wang
Yang Song
Sanjay Jha
VGen
29
3
0
19 Sep 2022
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
ZeroEGGS: Zero-shot Example-based Gesture Generation from Speech
Saeed Ghorbani
Ylva Ferstl
Daniel Holden
N. Troje
M. Carbonneau
41
79
0
15 Sep 2022
The Role of Vocal Persona in Natural and Synthesized Speech
The Role of Vocal Persona in Natural and Synthesized Speech
Camille Noufi
Lloyd May
J. Berger
27
2
0
06 Sep 2022
Speech Synthesis with Mixed Emotions
Speech Synthesis with Mixed Emotions
Kun Zhou
Berrak Sisman
R. Rana
B.W.Schuller
Haizhou Li
27
44
0
11 Aug 2022
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Towards Cross-speaker Reading Style Transfer on Audiobook Dataset
Xiang Li
Changhe Song
X. Wei
Zhiyong Wu
Jia Jia
Helen Meng
29
4
0
10 Aug 2022
A Study of Modeling Rising Intonation in Cantonese Neural Speech
  Synthesis
A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis
Qibing Bai
Tom Ko
Yu Zhang
27
4
0
03 Aug 2022
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Controllable and Lossless Non-Autoregressive End-to-End Text-to-Speech
Zhengxi Liu
Qiao Tian
Chenxu Hu
Xudong Liu
Meng-Che Wu
Yuping Wang
Hang Zhao
Yuxuan Wang
36
10
0
13 Jul 2022
BERT, can HE predict contrastive focus? Predicting and controlling
  prominence in neural TTS using a language model
BERT, can HE predict contrastive focus? Predicting and controlling prominence in neural TTS using a language model
Brooke Stephenson
Laurent Besacier
Laurent Girin
Thomas Hueber
25
8
0
04 Jul 2022
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis
  using ranking support vector machine with variational autoencoder
TTS-by-TTS 2: Data-selective augmentation for neural speech synthesis using ranking support vector machine with variational autoencoder
Eunwoo Song
Ryuichi Yamamoto
Ohsung Kwon
Chan Song
Min-Jae Hwang
Suhyeon Oh
Hyun-Wook Yoon
Jin-Seob Kim
Jae-Min Kim
37
7
0
30 Jun 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for
  Speech Synthesis based on Disentanglement between Prosody and Timbre
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre
Guangyan Zhang
Ying Qin
Wenbo Zhang
Jialun Wu
Mei Li
Yu Gai
Feijun Jiang
Tan Lee
50
26
0
29 Jun 2022
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many
  Fine-Grained Prosody Transfer
CopyCat2: A Single Model for Multi-Speaker TTS and Many-to-Many Fine-Grained Prosody Transfer
S. Karlapati
Penny Karanasou
Mateusz Lajszczak
Ammar Abbas
Alexis Moinet
Peter Makarov
Raymond Li
Arent van Korlaar
Simon Slangen
Thomas Drugman
26
15
0
27 Jun 2022
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Generating Diverse Vocal Bursts with StyleGAN2 and MEL-Spectrograms
Marco Jiralerspong
Gauthier Gidel
VLM
27
3
0
25 Jun 2022
Self-supervised Context-aware Style Representation for Expressive Speech
  Synthesis
Self-supervised Context-aware Style Representation for Expressive Speech Synthesis
Yihan Wu
Xi Wang
S. Zhang
Lei He
Ruihua Song
J. Nie
42
15
0
25 Jun 2022
End-to-End Text-to-Speech Based on Latent Representation of Speaking
  Styles Using Spontaneous Dialogue
End-to-End Text-to-Speech Based on Latent Representation of Speaking Styles Using Spontaneous Dialogue
Kentaro Mitsui
Tianyu Zhao
Kei Sawada
Yukiya Hono
Yoshihiko Nankaku
K. Tokuda
33
14
0
24 Jun 2022
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Automatic Prosody Annotation with Pre-Trained Text-Speech Model
Ziqian Dai
Jianwei Yu
Yan Wang
Nuo Chen
Yanyao Bian
Guangzhi Li
Deng Cai
Dong Yu
214
7
0
16 Jun 2022
StyleTTS: A Style-Based Generative Model for Natural and Diverse
  Text-to-Speech Synthesis
StyleTTS: A Style-Based Generative Model for Natural and Diverse Text-to-Speech Synthesis
Yinghao Aaron Li
Cong Han
N. Mesgarani
42
38
0
30 May 2022
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain
  Text-to-Speech
GenerSpeech: Towards Style Transfer for Generalizable Out-Of-Domain Text-to-Speech
Rongjie Huang
Yi Ren
Jinglin Liu
Chenye Cui
Zhou Zhao
OODD
VLM
117
34
0
15 May 2022
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level
  Quality
NaturalSpeech: End-to-End Text to Speech Synthesis with Human-Level Quality
Xu Tan
Jiawei Chen
Haohe Liu
Jian Cong
Chen Zhang
...
Lei He
Frank Soong
Tao Qin
Sheng Zhao
Tie-Yan Liu
44
213
0
09 May 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
The Sillwood Technologies System for the VoiceMOS Challenge 2022
Jiameng Gao
30
0
0
08 Apr 2022
Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch
  Encoder
Expressive Singing Synthesis Using Local Style Token and Dual-path Pitch Encoder
Juheon Lee
Hyeong-Seok Choi
Kyogu Lee
14
5
0
07 Apr 2022
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context
  Information for Mandarin Speech Synthesis
Towards Multi-Scale Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Jiankun Hu
Zhiyong Wu
Shiyin Kang
Helen Meng
27
10
0
06 Apr 2022
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
AdaSpeech 4: Adaptive Text to Speech in Zero-Shot Scenarios
Yihan Wu
Xu Tan
Bohan Li
Lei He
Sheng Zhao
Ruihua Song
Tao Qin
Tie-Yan Liu
VLM
DiffM
19
67
0
01 Apr 2022
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly
  Voice Agent
STUDIES: Corpus of Japanese Empathetic Dialogue Speech Towards Friendly Voice Agent
Yuki Saito
Yuto Nishimura
Shinnosuke Takamichi
Kentaro Tachibana
Hiroshi Saruwatari
19
12
0
28 Mar 2022
vTTS: visual-text to speech
vTTS: visual-text to speech
Yoshifumi Nakano
Takaaki Saeki
Shinnosuke Takamichi
Katsuhito Sudoh
Hiroshi Saruwatari
18
4
0
28 Mar 2022
Towards Expressive Speaking Style Modelling with Hierarchical Context
  Information for Mandarin Speech Synthesis
Towards Expressive Speaking Style Modelling with Hierarchical Context Information for Mandarin Speech Synthesis
Shunwei Lei
Yixuan Zhou
Liyang Chen
Zhiyong Wu
Shiyin Kang
Helen Meng
28
12
0
23 Mar 2022
Speaker Adaption with Intuitive Prosodic Features for Statistical
  Parametric Speech Synthesis
Speaker Adaption with Intuitive Prosodic Features for Statistical Parametric Speech Synthesis
Pengyu Cheng
Zhenhua Ling
40
3
0
02 Mar 2022
ADD 2022: the First Audio Deep Synthesis Detection Challenge
ADD 2022: the First Audio Deep Synthesis Detection Challenge
Jiangyan Yi
Ruibo Fu
J. Tao
Shuai Nie
Haoxin Ma
...
Le Xu
Zhengqi Wen
Haizhou Li
Zheng Lian
Bin Liu
25
176
0
17 Feb 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer
Disentangling Style and Speaker Attributes for TTS Style Transfer
Xiaochun An
Frank Soong
Lei Xie
70
18
0
24 Jan 2022
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for
  Singing Voice Synthesis
Opencpop: A High-Quality Open Source Chinese Popular Song Corpus for Singing Voice Synthesis
Yu Wang
Xinsheng Wang
Pengcheng Zhu
Jie Wu
Hanzhao Li
Heyang Xue
Yongmao Zhang
Lei Xie
Mengxiao Bi
25
97
0
19 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for
  emotional speech synthesis
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis
Yinjiao Lei
Shan Yang
Xinsheng Wang
Lei Xie
27
73
0
17 Jan 2022
IQDUBBING: Prosody modeling based on discrete self-supervised speech
  representation for expressive voice conversion
IQDUBBING: Prosody modeling based on discrete self-supervised speech representation for expressive voice conversion
Wendong Gan
Bolong Wen
Yin Yan
Haitao Chen
Zhichao Wang
Hongqiang Du
Lei Xie
Kaixuan Guo
Hai Li
15
14
0
02 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker
  Single-style Training Data Scenarios
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios
Qicong Xie
Tao Li
Xinsheng Wang
Zhichao Wang
Lei Xie
Guoqiao Yu
Guanglu Wan
32
11
0
23 Dec 2021
V2C: Visual Voice Cloning
V2C: Visual Voice Cloning
Qi Chen
Yuanqing Li
Yuankai Qi
Jiaqiu Zhou
Mingkui Tan
Qi Wu
VGen
33
24
0
25 Nov 2021
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End
  Speech Synthesis
Prosodic Clustering for Phoneme-level Prosody Control in End-to-End Speech Synthesis
Alexandra Vioni
Myrsini Christidou
Nikolaos Ellinas
G. Vamvoukakis
Panos Kakoulidis
Taehoon Kim
June Sig Sung
Hyoungmin Park
Aimilios Chalamandaris
Pirros Tsiakoulis
19
11
0
19 Nov 2021
Previous
1234
Next