ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.01203
30
0

Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition

1 June 2025
Muzammil Behzad
ArXiv (abs)PDFHTML
Main:8 Pages
3 Figures
Bibliography:2 Pages
5 Tables
Abstract

Facial expression recognition (FER) is a fundamental task in affective computing with applications in human-computer interaction, mental health analysis, and behavioral understanding. In this paper, we propose SMILE-VLM, a self-supervised vision-language model for 3D/4D FER that unifies multiview visual representation learning with natural language supervision. SMILE-VLM learns robust, semantically aligned, and view-invariant embeddings by proposing three core components: multiview decorrelation via a Barlow Twins-style loss, vision-language contrastive alignment, and cross-modal redundancy minimization. Our framework achieves the state-of-the-art performance on multiple benchmarks. We further extend SMILE-VLM to the task of 4D micro-expression recognition (MER) to recognize the subtle affective cues. The extensive results demonstrate that SMILE-VLM not only surpasses existing unsupervised methods but also matches or exceeds supervised baselines, offering a scalable and annotation-efficient solution for expressive facial behavior understanding.

View on arXiv
@article{behzad2025_2506.01203,
  title={ Self-Supervised Multi-View Representation Learning using Vision-Language Model for 3D/4D Facial Expression Recognition },
  author={ Muzammil Behzad },
  journal={arXiv preprint arXiv:2506.01203},
  year={ 2025 }
}
Comments on this paper