Clinical Depression and Affect Recognition with EmoAudioNet

Automatic emotions recognition and Major Depressive Disorder (MDD) diagnosis are inherently challenging problems in health informatics applications. According to the World Health Organization, 300M people are affected by depression in 2017 with only the third of them correctly identified. MDD is a permanent mood disorder where the patient constantly feels negative emotions (low Valence) and lacks excitement and interest (low Arousal). Therefore, developing emotionally intelligent systems enhances depression assessment in the early stages and improves the quality of service in depressed patients' care. In this paper, we present a framework for emotions recognition and depression diagnosis from speech called EmoAudioNet. To preserve the high privacy of patient data, only speech is used. The deep EmoAudioNet studies the time-frequency representation of the audio signal and the visual representation of its spectrum of frequencies. In the experimental results, two datasets are used: RECOLA for continuous dimensional emotion recognition and DAIC-WOZ for automatic depression diagnosis. The extensive experiments showed that the proposed approach significantly outperforms previous works in the literature in both datasets.
View on arXiv