Despite the remarkable progress made for synthesizing emotional speech from text, it is still challenging to provide emotion information to existing speech segments. Previous methods mainly rely on parallel data which is difficult to get. Moreover, few works have studied the generalization ability for one model to transfer emotion information across different languages. To cope with such problems, we propose an emotion transfer system named ET-GAN, for learning language-independent emotion transfer from one emotion to another without parallel training samples. Based on cycle-consistent generative adversarial network, our method ensures the transfer of only emotion with novel simple loss designs. Besides, we introduce an approach for migrating emotion information across different languages by using domain adaption. The experiment results show that our method can efficiently generate high-quality emotional speech for any given emotion category, without aligned speech pairs.
View on arXiv