Class Agnostic Instance-level Descriptor for Visual Instance Search
- OCL

Despite the great success of the deep features in content-based image retrieval, the visual instance search remains challenging due to the lack of effective instance level feature representation. Supervised or weakly supervised object detection methods are not among the options due to their poor performance on the unknown object categories. In this paper, based on the feature set output from self-supervised ViT, the instance level region discovery is modeled as detecting the compact feature subsets in a hierarchical fashion. The hierarchical decomposition results in a hierarchy of feature subsets. The non-leaf nodes and leaf nodes on the hierarchy correspond to the various instance regions in an image of different semantic scales. The hierarchical decomposition well addresses the problem of object embedding and occlusions, which are widely observed in the real scenarios. The features derived from the nodes on the hierarchy make up a comprehensive representation for the latent instances in the image. Our instance-level descriptor remains effective on both the known and unknown object categories. Empirical studies on three instance search benchmarks show that it outperforms state-of-the-art methods considerably.
View on arXiv@article{sun2025_2506.16745, title={ Class Agnostic Instance-level Descriptor for Visual Instance Search }, author={ Qi-Ying Sun and Wan-Lei Zhao and Yi-Bo Miao and Chong-Wah Ngo }, journal={arXiv preprint arXiv:2506.16745}, year={ 2025 } }