32
1

FreeA: Human-object Interaction Detection using Free Annotation Labels

Abstract

Recent human-object interaction (HOI) detection methods depend on extensively annotated image datasets, which require a significant amount of manpower. In this paper, we propose a novel self-adaptive, language-driven HOI detection method, termed FreeA. This method leverages the adaptability of the text-image model to generate latent HOI labels without requiring manual annotation. Specifically, FreeA aligns image features of human-object pairs with HOI text templates and employs a knowledge-based masking technique to decrease improbable interactions. Furthermore, FreeA implements a proposed method for matching interaction correlations to increase the probability of actions associated with a particular action, thereby improving the generated HOI labels. Experiments on two benchmark datasets showcase that FreeA achieves state-of-the-art performance among weakly supervised HOI competitors. Our proposal gets +\textbf{13.29} (\textbf{159\%\uparrow}) mAP and +\textbf{17.30} (\textbf{98\%\uparrow}) mAP than the newest ``Weakly'' supervised model, and +\textbf{7.19} (\textbf{28\%\uparrow}) mAP and +\textbf{14.69} (\textbf{34\%\uparrow}) mAP than the latest ``Weakly+'' supervised model, respectively, on HICO-DET and V-COCO datasets, more accurate in localizing and classifying the interactive actions. The source code will be made public.

View on arXiv
@article{liu2025_2403.01840,
  title={ FreeA: Human-object Interaction Detection using Free Annotation Labels },
  author={ Qi Liu and Yuxiao Wang and Xinyu Jiang and Wolin Liang and Zhenao Wei and Yu Lei and Nan Zhuang and Weiying Xue },
  journal={arXiv preprint arXiv:2403.01840},
  year={ 2025 }
}
Comments on this paper