Mood as a Contextual Cue for Improved Emotion Inference

Psychological studies observe that emotions are rarely expressed in isolation and are typically influenced by the surrounding context. While recent studies effectively harness uni- and multimodal cues for emotion inference, hardly any study has considered the effect of long-term affect, or \emph{mood}, on short-term \emph{emotion} inference. This study (a) proposes time-continuous \emph{valence} prediction from videos, fusing multimodal cues including \emph{mood} and \emph{emotion-change} () labels, (b) serially integrates spatial and channel attention for improved inference, and (c) demonstrates algorithmic generalisability with experiments on the \emph{EMMA} and \emph{AffWild2} datasets. Empirical results affirm that utilising mood labels is highly beneficial for dynamic valence prediction. Comparing \emph{unimodal} (training only with mood labels) vs \emph{multimodal} (training with mood and labels) results, inference performance improves for the latter, conveying that both long and short-term contextual cues are critical for time-continuous emotion inference.
View on arXiv