85
0

Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning

Abstract

Zero-shot Learning(ZSL) attains knowledge transfer from seen classes to unseen classes by exploring auxiliary category information, which is a promising yet difficult research topic. In this field, Audio-Visual Generalized Zero-Shot Learning~(AV-GZSL) has aroused researchers' great interest in which intricate relations within triple modalities~(audio, video, and natural language) render this task quite challenging but highly research-worthy. However, both existing embedding-based and generative-based AV-GZSL methods tend to suffer from domain shift problem a lot and we propose an extremely simple Out-of-distribution~(OOD) detection based AV-GZSL method~(EZ-AVOOD) to further mitigate bias problem by differentiating seen and unseen samples at the initial beginning. EZ-AVOOD accomplishes effective seen-unseen separation by exploiting the intrinsic discriminative information held in class-specific logits and class-agnostic feature subspace without training an extra OOD detector network. Followed by seen-unseen binary classification, we employ two expert models to classify seen samples and unseen samples separately. Compared to existing state-of-the-art methods, our model achieves superior ZSL and GZSL performances on three audio-visual datasets and becomes the new SOTA, which comprehensively demonstrates the effectiveness of the proposed EZ-AVOOD.

View on arXiv
@article{liu2025_2503.22197,
  title={ Extremely Simple Out-of-distribution Detection for Audio-visual Generalized Zero-shot Learning },
  author={ Yang Liu and Xun Zhang and Jiale Du and Xinbo Gao and Jungong Han },
  journal={arXiv preprint arXiv:2503.22197},
  year={ 2025 }
}
Comments on this paper

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.