ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.04004
  4. Cited By
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling
  for Zero-Shot Voice Cloning

U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning

6 October 2023
Tao Li
Zhichao Wang
Xinfa Zhu
Jian Cong
Qiao Tian
Yuping Wang
Lei Xie
    DiffM
ArXiv (abs)PDFHTML

Papers citing "U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning"

22 / 22 papers shown
Title
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech
  Synthesis with Diffusion and Style-based Models
ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models
Minki Kang
Wooseok Han
Sung Ju Hwang
Eunho Yang
DiffM
84
19
0
23 May 2023
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised
  Style Extractor and Hierarchical Modeling in Speech Synthesis
Improving Prosody for Cross-Speaker Style Transfer by Semi-Supervised Style Extractor and Hierarchical Modeling in Speech Synthesis
Chunyu Qiang
Peng Yang
Hao Che
Ying Zhang
Xiaorui Wang
Zhong-ming Wang
72
9
0
14 Mar 2023
SNAC: Speaker-normalized affine coupling layer in flow-based
  architecture for zero-shot multi-speaker text-to-speech
SNAC: Speaker-normalized affine coupling layer in flow-based architecture for zero-shot multi-speaker text-to-speech
Byoung Jin Choi
Myeonghun Jeong
Joun Yeop Lee
N. Kim
104
13
0
30 Nov 2022
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with
  Diffusion Models
Grad-StyleSpeech: Any-speaker Adaptive Text-to-Speech Synthesis with Diffusion Models
Minki Kang
Dong Min
Sung Ju Hwang
DiffM
98
50
0
17 Nov 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for
  End-to-End Speech Synthesis
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis
Tao Li
Xinsheng Wang
Qicong Xie
Zhichao Wang
Ming Jiang
Linfu Xie
90
16
0
04 Jul 2022
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice
  Conversion for everyone
YourTTS: Towards Zero-Shot Multi-Speaker TTS and Zero-Shot Voice Conversion for everyone
Edresson Casanova
Julian Weber
C. Shulby
Arnaldo Cândido Júnior
Eren Golge
M. Ponti
238
415
0
04 Dec 2021
Neural Analysis and Synthesis: Reconstructing Speech from
  Self-Supervised Representations
Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations
Hyeong-Seok Choi
Juheon Lee
W. Kim
Jie Hwan Lee
Hoon Heo
Kyogu Lee
100
158
0
27 Oct 2021
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation
Dong Min
Dong Bok Lee
Eunho Yang
Sung Ju Hwang
125
175
0
06 Jun 2021
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech
Vadim Popov
Ivan Vovk
Vladimir Gogoryan
Tasnima Sadekova
Mikhail Kudinov
DiffM
110
543
0
13 May 2021
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
SC-GlowTTS: an Efficient Zero-Shot Multi-Speaker Text-To-Speech Model
Edresson Casanova
C. Shulby
Eren Golge
Nicolas Müller
F. S. Oliveira
Arnaldo Cândido Júnior
A. S. Soares
S. Aluísio
M. Ponti
58
100
0
02 Apr 2021
Controllable Emotion Transfer For End-to-End Speech Synthesis
Controllable Emotion Transfer For End-to-End Speech Synthesis
Tao Li
Shan Yang
Liumeng Xue
Lei Xie
79
74
0
17 Nov 2020
Seen and Unseen emotional style transfer for voice conversion with a new
  emotional speech dataset
Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset
Kun Zhou
Berrak Sisman
Rui Liu
Haizhou Li
89
192
0
28 Oct 2020
HiFi-GAN: Generative Adversarial Networks for Efficient and High
  Fidelity Speech Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Jungil Kong
Jaehyeon Kim
Jaekyoung Bae
179
1,952
0
12 Oct 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
229
3,164
0
16 May 2020
Mellotron: Multispeaker expressive voice synthesis by conditioning on
  rhythm, pitch and global style tokens
Mellotron: Multispeaker expressive voice synthesis by conditioning on rhythm, pitch and global style tokens
Rafael Valle
Jason Chun Lok Li
R. Prenger
Bryan Catanzaro
72
149
0
26 Oct 2019
Learning latent representations for style control and transfer in
  end-to-end speech synthesis
Learning latent representations for style control and transfer in end-to-end speech synthesis
Ya-Jie Zhang
Shifeng Pan
Lei He
Zhenhua Ling
BDLSSLDRL
71
229
0
11 Dec 2018
Hierarchical Generative Modeling for Controllable Speech Synthesis
Hierarchical Generative Modeling for Controllable Speech Synthesis
Wei-Ning Hsu
Yu Zhang
Ron J. Weiss
Heiga Zen
Yonghui Wu
...
Ye Jia
Zhiwen Chen
Jonathan Shen
Patrick Nguyen
Ruoming Pang
BDL
77
276
0
16 Oct 2018
Transfer Learning from Speaker Verification to Multispeaker
  Text-To-Speech Synthesis
Transfer Learning from Speaker Verification to Multispeaker Text-To-Speech Synthesis
Ye Jia
Yu Zhang
Ron J. Weiss
Quan Wang
Jonathan Shen
...
Zhiwen Chen
Patrick Nguyen
Ruoming Pang
Ignacio López Moreno
Yonghui Wu
266
837
0
12 Jun 2018
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence
  Learning
Deep Voice 3: Scaling Text-to-Speech with Convolutional Sequence Learning
Ming-Yu Liu
Kainan Peng
Andrew Gibiansky
Sercan O. Arik
Ajay Kannan
Sharan Narang
Jonathan Raiman
John Miller
79
309
0
20 Oct 2017
Tacotron: Towards End-to-End Speech Synthesis
Tacotron: Towards End-to-End Speech Synthesis
Yuxuan Wang
RJ Skerry-Ryan
Daisy Stanton
Yonghui Wu
Ron J. Weiss
...
Samy Bengio
Quoc V. Le
Yannis Agiomyrgiannakis
R. Clark
Rif A. Saurous
173
1,833
0
29 Mar 2017
Instance Normalization: The Missing Ingredient for Fast Stylization
Instance Normalization: The Missing Ingredient for Fast Stylization
Dmitry Ulyanov
Andrea Vedaldi
Victor Lempitsky
OOD
180
3,714
0
27 Jul 2016
Unsupervised Domain Adaptation by Backpropagation
Unsupervised Domain Adaptation by Backpropagation
Yaroslav Ganin
Victor Lempitsky
OOD
258
6,043
0
26 Sep 2014
1