Controllable Emotion Transfer For End-to-End Speech Synthesis

17 November 2020

Tao Li

Shan Yang

Liumeng Xue

Lei Xie

ArXiv PDF HTML

Papers citing "Controllable Emotion Transfer For End-to-End Speech Synthesis"

48 / 48 papers shown

Title
A Review of Human Emotion Synthesis Based on Generative Technology Fei Ma Yong Li Yifan Xie Y. He Yujie Zhang ... Z. Liu Wei Yao Fuji Ren Fei Richard Yu Shiguang Ni 78 1 0 10 Dec 2024
Emo-DPO: Controllable Emotional Speech Synthesis through Direct Preference Optimization Xiaoxue Gao Chen Zhang Yiming Chen Huayun Zhang Nancy F. Chen 47 6 0 16 Sep 2024
Laugh Now Cry Later: Controlling Time-Varying Emotional States of Flow-Matching-Based Zero-Shot Text-to-Speech Haibin Wu Xiaofei Wang Sefik Emre Eskimez Manthan Thakker Daniel Tompkins ... Canrun Li Zhen Xiao Sheng Zhao Jinyu Li Naoyuki Kanda 28 6 0 17 Jul 2024
DEX-TTS: Diffusion-based EXpressive Text-to-Speech with Style Modeling on Time Variability Hyun Joon Park Jin Sob Kim Wooseok Shin Sung Won Han DiffM 41 2 0 27 Jun 2024
RSET: Remapping-based Sorting Method for Emotion Transfer Speech Synthesis Haoxiang Shi Jianzong Wang Xulong Zhang Ning Cheng Jun Yu Jing Xiao 44 2 0 27 May 2024
Fine-Grained Quantitative Emotion Editing for Speech Generation Sho Inoue Kun Zhou Shuai Wang Haizhou Li 43 2 0 04 Mar 2024
Boosting Multi-Speaker Expressive Speech Synthesis with Semi-supervised Contrastive Learning Xinfa Zhu Yuke Li Yinjiao Lei Ning Jiang Guoqing Zhao Lei Xie 28 0 0 26 Oct 2023
U-Style: Cascading U-nets with Multi-level Speaker and Style Modeling for Zero-Shot Voice Cloning Tao Li Zhichao Wang Xinfa Zhu Jian Cong Qiao Tian Yuping Wang Lei Xie DiffM 35 3 0 06 Oct 2023
The Importance of Multimodal Emotion Conditioning and Affect Consistency for Embodied Conversational Agents Che-Jui Chang Samuel S. Sohn Sen Zhang R. Jayashankar Muhammad Usman Mubbasir Kapadia 38 7 0 26 Sep 2023
HiGNN-TTS: Hierarchical Prosody Modeling with Graph Neural Networks for Expressive Long-form TTS Dake Guo Xinfa Zhu Liumeng Xue Tao Li Yuanjun Lv Yuepeng Jiang Linfu Xie 22 1 0 25 Sep 2023
Diversity-based core-set selection for text-to-speech with linguistic and acoustic features Kentaro Seki Shinnosuke Takamichi Takaaki Saeki Hiroshi Saruwatari 26 3 0 15 Sep 2023
DiCLET-TTS: Diffusion Model based Cross-lingual Emotion Transfer for Text-to-Speech -- A Study between English and Mandarin Tao Li Chenxu Hu Jian Cong Xinfa Zhu Jingbei Li Qiao Tian Yuping Wang Linfu Xie DiffM 43 8 0 02 Sep 2023
AffectEcho: Speaker Independent and Language-Agnostic Emotion and Affect Transfer for Speech Synthesis Hrishikesh Viswanath Aneesh Bhattacharya Pascal Jutras-Dubé Prerit Gupta Mridu Prashanth Yashvardhan Khaitan Aniket Bera 32 2 0 16 Aug 2023
SC VALL-E: Style-Controllable Zero-Shot Text to Speech Synthesizer Daegyeom Kim Seong-soo Hong Yong-Hoon Choi 25 2 0 20 Jul 2023
EmoSpeech: Guiding FastSpeech2 Towards Emotional Text to Speech Daria Diatlova V. Shutov 34 8 0 28 Jun 2023
EmoMix: Emotion Mixing via Diffusion Models for Emotional Speech Synthesis Haobin Tang Xulong Zhang Jianzong Wang Ning Cheng Jing Xiao DiffM 19 24 0 01 Jun 2023
PromptStyle: Controllable Style Transfer for Text-to-Speech with Natural Language Descriptions Guanghou Liu Yongmao Zhang Yinjiao Lei Yunlin Chen Rui Wang Zhifei Li Linfu Xie 39 37 0 31 May 2023
Accented Text-to-Speech Synthesis with Limited Data Xuehao Zhou Mingyang Zhang Yi Zhou Zhizheng Wu Haizhou Li 34 12 0 08 May 2023
Style-Label-Free: Cross-Speaker Style Transfer by Quantized VAE and Speaker-wise Normalization in Speech Synthesis Chunyu Qiang Peng Yang Hao Che Xiaorui Wang Zhongyuan Wang BDL 34 6 0 13 Dec 2022
Multi-Speaker Expressive Speech Synthesis via Multiple Factors Decoupling Xinfa Zhu Yinjiao Lei Kun Song Yongmao Zhang Tao Li Linfu Xie 21 17 0 19 Nov 2022
Semi-supervised learning for continuous emotional intensity controllable speech synthesis with disentangled representations Yoorim Oh Juheon Lee Yoseob Han Kyogu Lee 28 3 0 11 Nov 2022
Multi-Speaker Multi-Style Speech Synthesis with Timbre and Style Disentanglement Wei Song Ya Yue Ya-Jie Zhang Zhengchen Zhang Youzheng Wu Xiaodong He 32 4 0 02 Nov 2022
AccentSpeech: Learning Accent from Crowd-sourced Data for Target Speaker TTS with Accents Yongmao Zhang Zhichao Wang Pei-Yin Yang Hongshen Sun Zhisheng Wang Linfu Xie 28 6 0 31 Oct 2022
Read it to me: An emotionally aware Speech Narration Application Rishibha Bansal 16 0 0 06 Sep 2022
Speech Synthesis with Mixed Emotions Kun Zhou Berrak Sisman R. Rana B.W.Schuller Haizhou Li 27 44 0 11 Aug 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 33 28 0 19 Jul 2022
Text-driven Emotional Style Control and Cross-speaker Style Transfer in Neural TTS Yookyung Shin Younggun Lee Suhee Jo Yeongtae Hwang Taesu Kim 25 14 0 13 Jul 2022
Cross-speaker Emotion Transfer Based On Prosody Compensation for End-to-End Speech Synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Ming Jiang Linfu Xie 35 15 0 04 Jul 2022
iEmoTTS: Toward Robust Cross-Speaker Emotion Transfer and Control for Speech Synthesis based on Disentanglement between Prosody and Timbre Guangyan Zhang Ying Qin Wenbo Zhang Jialun Wu Mei Li Yu Gai Feijun Jiang Tan Lee 50 26 0 29 Jun 2022
Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning Rui Liu Berrak Sisman Björn Schuller Guanglai Gao Haizhou Li 27 11 0 15 Jun 2022
MuSE-SVS: Multi-Singer Emotional Singing Voice Synthesizer that Controls Emotional Intensity Sungjae Kim Y.E. Kim Jewoo Jun Injung Kim 31 13 0 02 Mar 2022
Disentangling Style and Speaker Attributes for TTS Style Transfer Xiaochun An Frank Soong Lei Xie 68 18 0 24 Jan 2022
MsEmoTTS: Multi-scale emotion transfer, prediction, and control for emotional speech synthesis Yinjiao Lei Shan Yang Xinsheng Wang Lei Xie 27 73 0 17 Jan 2022
Emotion Intensity and its Control for Emotional Voice Conversion Kun Zhou Berrak Sisman R. Rana Björn W. Schuller Haizhou Li 65 54 0 10 Jan 2022
Multi-speaker Multi-style Text-to-speech Synthesis With Single-speaker Single-style Training Data Scenarios Qicong Xie Tao Li Xinsheng Wang Zhichao Wang Lei Xie Guoqiao Yu Guanglu Wan 32 11 0 23 Dec 2021
Fine-grained style control in Transformer-based Text-to-speech Synthesis Li-Wei Chen Alexander I. Rudnicky 88 29 0 12 Oct 2021
Environment Aware Text-to-Speech Synthesis Daxin Tan Guangyan Zhang Tan Lee 13 3 0 08 Oct 2021
StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis Rui Liu Berrak Sisman Haizhou Li 29 2 0 07 Oct 2021
Emotional Speech Synthesis for Companion Robot to Imitate Professional Caregiver Speech Takeshi Homma Qinghua Sun Takuya Fujioka R. Takawaki Eriko Ankyu Kenji Nagamatsu Daichi Sugawara E. Harada 17 1 0 27 Sep 2021
Cross-speaker emotion disentangling and transfer for end-to-end speech synthesis Tao Li Xinsheng Wang Qicong Xie Zhichao Wang Linfu Xie 26 42 0 14 Sep 2021
A Survey on Neural Speech Synthesis Xu Tan Tao Qin Frank Soong Tie-Yan Liu AI4TS 18 352 0 29 Jun 2021
Improving Performance of Seen and Unseen Speech Style Transfer in End-to-end Neural TTS Xiaochun An Frank Soong Lei Xie 42 9 0 18 Jun 2021
EMOVIE: A Mandarin Emotion Speech Dataset with a Simple Emotional Text-to-Speech Model Chenye Cui Yi Ren Jinglin Liu Feiyang Chen Rongjie Huang Ming Lei Zhou Zhao 24 35 0 17 Jun 2021
Emotional Voice Conversion: Theory, Databases and ESD Kun Zhou Berrak Sisman Rui Liu Haizhou Li 33 168 0 31 May 2021
Review of end-to-end speech synthesis technology based on deep learning Zhaoxi Mu Xinyu Yang Yizhuo Dong AuLLM ALM 26 24 0 20 Apr 2021
Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability Rui Liu Berrak Sisman Haizhou Li 34 32 0 03 Apr 2021
Investigating on Incorporating Pretrained and Learnable Speaker Representations for Multi-Speaker Multi-Style Text-to-Speech C. Chien Jheng-hao Lin Chien-yu Huang Po-Chun Hsu Hung-yi Lee 27 68 0 06 Mar 2021
Fine-grained Style Modeling, Transfer and Prediction in Text-to-Speech Synthesis via Phone-Level Content-Style Disentanglement Daxin Tan Tan Lee 29 21 0 08 Nov 2020