Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition

11 June 2019

Abstract

This paper proposes a Residual Convolutional Neural Network (ResNet) based on speech features and trained under Focal Loss to recognize emotion in speech. Speech features such as Spectrogram and Mel-frequency Cepstral Coefficients (MFCCs) have shown the ability to characterize emotion better than just plain text. Further Focal Loss, first used in One-Stage Object Detectors, has shown the ability to focus the training process more towards hard-examples and down-weight the loss assigned to well-classified examples, thus preventing the model from being overwhelmed by easily classifiable examples.

View on arXiv

@article{tripathi2025_1906.05682,
  title={ Focal Loss based Residual Convolutional Neural Network for Speech Emotion Recognition },
  author={ Suraj Tripathi and Abhay Kumar and Abhiram Ramesh and Chirag Singh and Promod Yenigalla },
  journal={arXiv preprint arXiv:1906.05682},
  year={ 2025 }
}

Comments on this paper