ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech
Synthesis with Diffusion and Style-based Models

ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models

23 May 2023

Papers citing "ZET-Speech: Zero-shot adaptive Emotion-controllable Text-to-Speech Synthesis with Diffusion and Style-based Models"

15 / 15 papers shown

Title
OZSpeech: One-step Zero-shot Speech Synthesis with Learned-Prior-Conditioned Flow Matching Hieu-Nghia Huynh-Nguyen Ngoc Son Nguyen Huynh Nguyen Dang Thieu Vo Truong-Son Hy Van Nguyen 14 0 0 19 May 2025
Voice Cloning: Comprehensive Survey Hussam Azzuni Abdulmotaleb El Saddik VLM 44 0 0 01 May 2025
EmoVoice: LLM-based Emotional Text-To-Speech Model with Freestyle Text Prompting Guanrou Yang Chen Yang Qian Chen Ziyang Ma Wenxi Chen ... Fan Yu Zhihao Du Zhifu Gao Shiliang Zhang Xie Chen AuLLM 57 0 0 17 Apr 2025
A Review of Human Emotion Synthesis Based on Generative Technology Fei Ma Yong Li Yifan Xie Y. He Yujie Zhang ... Z. Liu Wei Yao Fuji Ren Fei Richard Yu Shiguang Ni 78 1 0 10 Dec 2024
EmoKnob: Enhance Voice Cloning with Fine-Grained Emotion Control Haozhe Chen Run Chen Julia Hirschberg 26 3 0 01 Oct 2024
Exploring synthetic data for cross-speaker style transfer in style representation based TTS Lucas Ueda Leonardo B. de M. M. Marques Flávio O. Simões Mário Uliani Neto Fernando Runstein Bianca Dal Bó Paula D. P. Costa 33 0 0 25 Sep 2024
StyleFusion TTS: Multimodal Style-control and Enhanced Feature Fusion for Zero-shot Text-to-speech Synthesis Zhiyong Chen Xinnuo Li Zhiqi Ai Shugong Xu DiffM 36 1 0 24 Sep 2024
Enhancing Emotional Text-to-Speech Controllability with Natural Language Guidance through Contrastive Learning and Diffusion Models Xin Jing Kun Zhou Andreas Triantafyllopoulos Björn W. Schuller DiffM 42 3 0 10 Sep 2024
EmoSphere-TTS: Emotional Style and Intensity Modeling via Spherical Emotion Vector for Controllable Emotional Text-to-Speech Deok-Hyeon Cho Hyung-Seok Oh Seung-Bin Kim Sang-Hoon Lee Seong-Whan Lee 45 7 0 12 Jun 2024
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning Tao Li Zhichao Wang Xinfa Zhu Jian Cong Qiao Tian Yuping Wang Lei Xie DiffM 35 3 0 06 Oct 2023
On the Design Fundamentals of Diffusion Models: A Survey Ziyi Chang George Alex Koulieris Hubert P. H. Shum DiffM 29 54 0 07 Jun 2023
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Ming Jiang Linfu Xie 35 15 0 04 Jul 2022
Guided-TTS 2: A Diffusion Model for High-quality Adaptive Text-to-Speech with Untranscribed Data Sungwon Kim Heeseung Kim Sung-Hoon Yoon DiffM 204 52 0 30 May 2022
A Style-Based Generator Architecture for Generative Adversarial Networks Tero Karras S. Laine Timo Aila 306 10,378 0 12 Dec 2018
Domain-Adversarial Training of Neural Networks Yaroslav Ganin E. Ustinova Hana Ajakan Pascal Germain Hugo Larochelle François Laviolette M. Marchand Victor Lempitsky GAN OOD 179 9,342 0 28 May 2015