Facial expression recognition is a challenging classification task that holds broad application prospects in the field of human-computer interaction. This paper aims to introduce the method we will adopt in the 8th Affective and Behavioral Analysis in the Wild (ABAW) Competition, which will be held during the Conference on Computer Vision and Pattern Recognition (CVPR) inthis http URLof all, we apply the frequency masking technique and the method of extracting data at equal time intervals to conduct targeted processing on the original videos. Then, based on the residual hybrid convolutional neural network and the multi-branch convolutional neural network respectively, we design feature extraction models for image and audio sequences. In particular, we propose a global channel-spatial attention mechanism to enhance the features initially extracted from both the audio and image modalitiesthis http URL, we adopt a decision fusion strategy based on the proportional criterion to fuse the classification results of the two single modalities, obtain an emotion probability vector, and output the final emotional classification. We also design a coarse - fine granularity loss function to optimize the performance of the entire network, which effectively improves the accuracy of facial expressionthis http URLthe facial expression recognition task of the 8th ABAW Competition, our method ranked third on the official validation set. This result fully confirms the effectiveness and competitiveness of the method we have proposed.
View on arXiv@article{yu2025_2503.11935, title={ Design of an Expression Recognition Solution Based on the Global Channel-Spatial Attention Mechanism and Proportional Criterion Fusion }, author={ Jun Yu and Yang Zheng and Lei Wang and Yongqi Wang and Shengfan Xu }, journal={arXiv preprint arXiv:2503.11935}, year={ 2025 } }