ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2201.09472
  4. Cited By
Disentangling Style and Speaker Attributes for TTS Style Transfer

Disentangling Style and Speaker Attributes for TTS Style Transfer

24 January 2022
Xiaochun An
Frank Soong
Lei Xie
ArXivPDFHTML

Papers citing "Disentangling Style and Speaker Attributes for TTS Style Transfer"

50 / 54 papers shown
Title
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
217
30,089
0
01 Mar 2022
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech
  Synthesis
Cross-speaker Style Transfer with Prosody Bottleneck in Neural Speech Synthesis
Shifeng Pan
Lei He
49
23
0
27 Jul 2021
Improving Performance of Seen and Unseen Speech Style Transfer in
  End-to-end Neural TTS
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS
Xiaochun An
Frank Soong
Lei Xie
80
9
0
18 Jun 2021
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker
  Verification
Multi-Level Transfer Learning from Near-Field to Far-Field Speaker Verification
Li Zhang
Qing Wang
Kong Aik Lee
Lei Xie
Haizhou Li
47
13
0
17 Jun 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
45
74
0
17 Nov 2020
Fine-grained Emotion Strength Transfer, Control and Prediction for
  Emotional Speech Synthesis
Fine-grained Emotion Strength Transfer, Control and Prediction for Emotional Speech Synthesis
Yinjiao Lei
Shan Yang
Lei Xie
52
56
0
17 Nov 2020
Recent Developments on ESPnet Toolkit Boosted by Conformer
Recent Developments on ESPnet Toolkit Boosted by Conformer
Pengcheng Guo
Florian Boyer
Xuankai Chang
Tomoki Hayashi
Yosuke Higuchi
...
Jing Shi
Shinji Watanabe
Kun Wei
Wangyou Zhang
Yuekai Zhang
57
263
0
26 Oct 2020
Clova Baseline System for the VoxCeleb Speaker Recognition Challenge
  2020
Clova Baseline System for the VoxCeleb Speaker Recognition Challenge 2020
Hee-Soo Heo
Bong-Jin Lee
Jaesung Huh
Joon Son Chung
30
133
0
29 Sep 2020
Expressive TTS Training with Frame and Style Reconstruction Loss
Expressive TTS Training with Frame and Style Reconstruction Loss
Rui Liu
Berrak Sisman
Guanglai Gao
Haizhou Li
69
73
0
04 Aug 2020
Fully-hierarchical fine-grained prosody modeling for interpretable
  speech synthesis
Fully-hierarchical fine-grained prosody modeling for interpretable speech synthesis
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuanbin Cao
Heiga Zen
Yonghui Wu
40
130
0
06 Feb 2020
Generating diverse and natural text-to-speech samples using a quantized
  fine-grained VAE and auto-regressive prosody prior
Generating diverse and natural text-to-speech samples using a quantized fine-grained VAE and auto-regressive prosody prior
Guangzhi Sun
Yu Zhang
Ron J. Weiss
Yuan Cao
Heiga Zen
Andrew Rosenberg
Bhuvana Ramabhadran
Yonghui Wu
DiffM
62
92
0
06 Feb 2020
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis
  of Expressive Speech
Using VAEs and Normalizing Flows for One-shot Text-To-Speech Synthesis of Expressive Speech
Vatsal Aggarwal
Marius Cotescu
N. Prateek
Jaime Lorenzo-Trueba
Roberto Barra-Chicote
38
30
0
28 Nov 2019
Prosody Transfer in Neural Text to Speech Using Global Pitch and
  Loudness Features
Prosody Transfer in Neural Text to Speech Using Global Pitch and Loudness Features
Francesco Ferroni
Kilol Gupta
D. Shah
Z. Shakeri
Jervis Pinto
37
15
0
21 Nov 2019
Multi-Reference Neural TTS Stylization with Adversarial Cycle
  Consistency
Multi-Reference Neural TTS Stylization with Adversarial Cycle Consistency
M. Whitehill
Shuang Ma
Daniel J. McDuff
Yale Song
45
35
0
25 Oct 2019
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Semi-Supervised Generative Modeling for Controllable Speech Synthesis
Raza Habib
Soroosh Mariooryad
Matt Shannon
Eric Battenberg
RJ Skerry-Ryan
Daisy Stanton
David Kao
Tom Bagby
BDL
37
48
0
03 Oct 2019
Universal audio synthesizer control with normalizing flows
Universal audio synthesizer control with normalizing flows
P. Esling
Naotake Masuda
Adrien Bardet
R. Despres
Axel Chemla-Romeu-Santos
46
45
0
01 Jul 2019
End-to-End Emotional Speech Synthesis Using Style Tokens and
  Semi-Supervised Training
End-to-End Emotional Speech Synthesis Using Style Tokens and Semi-Supervised Training
Peng Wu
Zhenhua Ling
Li-Juan Liu
Yuan Jiang
Hong-Chuan Wu
Lirong Dai
34
72
0
26 Jun 2019
A New GAN-based End-to-End TTS Training Algorithm
A New GAN-based End-to-End TTS Training Algorithm
Haohan Guo
Frank Soong
Lei He
Lei Xie
47
47
0
09 Apr 2019
Multi-reference Tacotron by Intercross Training for Style
  Disentangling,Transfer and Control in Speech Synthesis
Multi-reference Tacotron by Intercross Training for Style Disentangling,Transfer and Control in Speech Synthesis
Yanyao Bian
Changbin Chen
Yongguo Kang
Zhenglin Pan
35
46
0
04 Apr 2019
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
A Real-Time Wideband Neural Vocoder at 1.6 kb/s Using LPCNet
J. Valin
Jan Skoglund
51
79
0
28 Mar 2019
Learning latent representations for style control and transfer in
  end-to-end speech synthesis
Learning latent representations for style control and transfer in end-to-end speech synthesis
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDL
SSL
DRL
48
228
0
11 Dec 2018
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
LPCNet: Improving Neural Speech Synthesis Through Linear Prediction
J. Valin
Jan Skoglund
62
451
0
28 Oct 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis
Hierarchical Generative Modeling for Controllable Speech Synthesis
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
...
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
60
275
0
16 Oct 2018
The Deep Weight Prior
The Deep Weight Prior
Andrei Atanov
Arsenii Ashukha
Kirill Struminsky
Dmitry Vetrov
Max Welling
BDL
44
37
0
16 Oct 2018
Predicting Expressive Speaking Style From Text In End-To-End Speech
  Synthesis
Predicting Expressive Speaking Style From Text In End-To-End Speech Synthesis
Daisy Stanton
Yuxuan Wang
RJ Skerry-Ryan
51
122
0
04 Aug 2018
VoxCeleb2: Deep Speaker Recognition
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
348
2,274
0
14 Jun 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
251
828
0
12 Jun 2018
Self-Attention Generative Adversarial Networks
Self-Attention Generative Adversarial Networks
Han Zhang
Ian Goodfellow
Dimitris N. Metaxas
Augustus Odena
GAN
129
3,720
0
21 May 2018
Understanding disentangling in $β$-VAE
Understanding disentangling in βββ-VAE
Christopher P. Burgess
I. Higgins
Arka Pal
Loic Matthey
Nicholas Watters
Guillaume Desjardins
Alexander Lerchner
CoGe
DRL
57
829
0
10 Apr 2018
Expressive Speech Synthesis via Modeling Expressions with Variational
  Autoencoder
Expressive Speech Synthesis via Modeling Expressions with Variational Autoencoder
K. Akuzawa
Yusuke Iwasawa
Y. Matsuo
35
139
0
06 Apr 2018
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with
  Tacotron
Towards End-to-End Prosody Transfer for Expressive Speech Synthesis with Tacotron
RJ Skerry-Ryan
Eric Battenberg
Y. Xiao
Yuxuan Wang
Daisy Stanton
Joel Shor
Ron J. Weiss
R. Clark
Rif A. Saurous
54
554
0
24 Mar 2018
Style Tokens: Unsupervised Style Modeling, Control and Transfer in
  End-to-End Speech Synthesis
Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis
Yuxuan Wang
Daisy Stanton
Yu Zhang
RJ Skerry-Ryan
Eric Battenberg
Joel Shor
Y. Xiao
Fei Ren
Ye Jia
Rif A. Saurous
64
825
0
23 Mar 2018
Sylvester Normalizing Flows for Variational Inference
Sylvester Normalizing Flows for Variational Inference
Rianne van den Berg
Leonard Hasenclever
Jakub M. Tomczak
Max Welling
BDL
DRL
58
252
0
15 Mar 2018
Efficient Neural Audio Synthesis
Efficient Neural Audio Synthesis
Nal Kalchbrenner
Erich Elsen
Karen Simonyan
Seb Noury
Norman Casagrande
Edward Lockhart
Florian Stimberg
Aaron van den Oord
Sander Dieleman
Koray Kavukcuoglu
87
867
0
23 Feb 2018
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram
  Predictions
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
Jonathan Shen
Ruoming Pang
Ron J. Weiss
M. Schuster
Navdeep Jaitly
...
Yuxuan Wang
RJ Skerry-Ryan
Rif A. Saurous
Yannis Agiomyrgiannakis
Yonghui Wu
77
2,694
0
16 Dec 2017
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Parallel WaveNet: Fast High-Fidelity Speech Synthesis
Aaron van den Oord
Yazhe Li
Igor Babuschkin
Karen Simonyan
Oriol Vinyals
...
Alex Graves
Helen King
T. Walters
Dan Belov
Demis Hassabis
181
858
0
28 Nov 2017
Generalized End-to-End Loss for Speaker Verification
Generalized End-to-End Loss for Speaker Verification
Li Wan
Quan Wang
Alan Papir
Ignacio López Moreno
VLM
66
924
0
28 Oct 2017
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan O. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
63
307
0
20 Oct 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
591
130,942
0
12 Jun 2017
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Deep Voice 2: Multi-Speaker Neural Text-to-Speech
Sercan O. Arik
G. Diamos
Andrew Gibiansky
John Miller
Kainan Peng
Ming-Yu Liu
Jonathan Raiman
Yanqi Zhou
70
495
0
24 May 2017
Learning Latent Representations for Speech Generation and Transformation
Learning Latent Representations for Speech Generation and Transformation
Wei-Ning Hsu
Yu Zhang
James R. Glass
DRL
BDL
SSL
50
145
0
13 Apr 2017
Tacotron: Towards End-to-End Speech Synthesis
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
153
1,819
0
29 Mar 2017
Disentangling factors of variation in deep representations using
  adversarial training
Disentangling factors of variation in deep representations using adversarial training
Michaël Mathieu
Jiaqi Zhao
Pablo Sprechmann
Aditya A. Ramesh
Yann LeCun
DRL
CML
89
490
0
10 Nov 2016
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Zhiwen Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
847
6,781
0
26 Sep 2016
Improving Variational Inference with Inverse Autoregressive Flow
Improving Variational Inference with Inverse Autoregressive Flow
Diederik P. Kingma
Tim Salimans
Rafal Jozefowicz
Xi Chen
Ilya Sutskever
Max Welling
BDL
DRL
105
1,816
0
15 Jun 2016
Generating Sentences from a Continuous Space
Generating Sentences from a Continuous Space
Samuel R. Bowman
Luke Vilnis
Oriol Vinyals
Andrew M. Dai
Rafal Jozefowicz
Samy Bengio
DRL
98
2,358
0
19 Nov 2015
Attention-Based Models for Speech Recognition
Attention-Based Models for Speech Recognition
J. Chorowski
Dzmitry Bahdanau
Dmitriy Serdyuk
Kyunghyun Cho
Yoshua Bengio
115
2,606
0
24 Jun 2015
Variational Inference with Normalizing Flows
Variational Inference with Normalizing Flows
Danilo Jimenez Rezende
S. Mohamed
DRL
BDL
284
4,167
0
21 May 2015
MADE: Masked Autoencoder for Distribution Estimation
MADE: Masked Autoencoder for Distribution Estimation
M. Germain
Karol Gregor
Iain Murray
Hugo Larochelle
OOD
SyDa
UQCV
146
867
0
12 Feb 2015
Grammar as a Foreign Language
Grammar as a Foreign Language
Oriol Vinyals
Lukasz Kaiser
Terry Koo
Slav Petrov
Ilya Sutskever
Geoffrey E. Hinton
97
930
0
23 Dec 2014
12
Next