5
0
v1v2 (latest)

Generalized Category Discovery under the Long-Tailed Distribution

Main:7 Pages
3 Figures
Bibliography:3 Pages
15 Tables
Appendix:1 Pages
Abstract

This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.

View on arXiv
@article{zhao2025_2506.12515,
  title={ Generalized Category Discovery under the Long-Tailed Distribution },
  author={ Bingchen Zhao and Kai Han },
  journal={arXiv preprint arXiv:2506.12515},
  year={ 2025 }
}
Comments on this paper