FreeA: Human-object Interaction Detection using Free Annotation Labels

Recent human-object interaction (HOI) detection methods depend on extensively annotated image datasets, which require a significant amount of manpower. In this paper, we propose a novel self-adaptive, language-driven HOI detection method, termed FreeA. This method leverages the adaptability of the text-image model to generate latent HOI labels without requiring manual annotation. Specifically, FreeA aligns image features of human-object pairs with HOI text templates and employs a knowledge-based masking technique to decrease improbable interactions. Furthermore, FreeA implements a proposed method for matching interaction correlations to increase the probability of actions associated with a particular action, thereby improving the generated HOI labels. Experiments on two benchmark datasets showcase that FreeA achieves state-of-the-art performance among weakly supervised HOI competitors. Our proposal gets +\textbf{13.29} (\textbf{159\%}) mAP and +\textbf{17.30} (\textbf{98\%}) mAP than the newest ``Weakly'' supervised model, and +\textbf{7.19} (\textbf{28\%}) mAP and +\textbf{14.69} (\textbf{34\%}) mAP than the latest ``Weakly+'' supervised model, respectively, on HICO-DET and V-COCO datasets, more accurate in localizing and classifying the interactive actions. The source code will be made public.
View on arXiv@article{liu2025_2403.01840, title={ FreeA: Human-object Interaction Detection using Free Annotation Labels }, author={ Qi Liu and Yuxiao Wang and Xinyu Jiang and Wolin Liang and Zhenao Wei and Yu Lei and Nan Zhuang and Weiying Xue }, journal={arXiv preprint arXiv:2403.01840}, year={ 2025 } }