ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1906.05507
6
10

Adjusting Pleasure-Arousal-Dominance for Continuous Emotional Text-to-speech Synthesizer

13 June 2019
Azam Rabiee
Tae-Ho Kim
Soo-Young Lee
ArXivPDFHTML
Abstract

Emotion is not limited to discrete categories of happy, sad, angry, fear, disgust, surprise, and so on. Instead, each emotion category is projected into a set of nearly independent dimensions, named pleasure (or valence), arousal, and dominance, known as PAD. The value of each dimension varies from -1 to 1, such that the neutral emotion is in the center with all-zero values. Training an emotional continuous text-to-speech (TTS) synthesizer on the independent dimensions provides the possibility of emotional speech synthesis with unlimited emotion categories. Our end-to-end neural speech synthesizer is based on the well-known Tacotron. Empirically, we have found the optimum network architecture for injecting the 3D PADs. Moreover, the PAD values are adjusted for the speech synthesis purpose.

View on arXiv
Comments on this paper