SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection

Aiming at the problem of difficulty in accurately identifying graphical implicit correlations in multimodal irony detection tasks, this paper proposes a Semantic Irony Recognition Network (SemIRNet). The model contains three main innovations: (1) The ConceptNet knowledge base is introduced for the first time to acquire conceptual knowledge, which enhances the model's common-sense reasoning ability; (2) Two cross-modal semantic similarity detection modules at the word level and sample level are designed to model graphic-textual correlations at different granularities; and (3) A contrastive learning loss function is introduced to optimize the spatial distribution of the sample features, which improves the separability of positive and negative samples. Experiments on a publicly available multimodal irony detection benchmark dataset show that the accuracy and F1 value of this model are improved by 1.64% and 2.88% to 88.87% and 86.33%, respectively, compared with the existing optimal methods. Further ablation experiments verify the important role of knowledge fusion and semantic similarity detection in improving the model performance.
View on arXiv@article{zhou2025_2506.14791, title={ SemIRNet: A Semantic Irony Recognition Network for Multimodal Sarcasm Detection }, author={ Jingxuan Zhou and Yuehao Wu and Yibo Zhang and Yeyubei Zhang and Yunchong Liu and Bolin Huang and Chunhong Yuan }, journal={arXiv preprint arXiv:2506.14791}, year={ 2025 } }