Generalized Category Discovery under the Long-Tailed Distribution

This paper addresses the problem of Generalized Category Discovery (GCD) under a long-tailed distribution, which involves discovering novel categories in an unlabelled dataset using knowledge from a set of labelled categories. Existing works assume a uniform distribution for both datasets, but real-world data often exhibits a long-tailed distribution, where a few categories contain most examples, while others have only a few. While the long-tailed distribution is well-studied in supervised and semi-supervised settings, it remains unexplored in the GCD context. We identify two challenges in this setting - balancing classifier learning and estimating category numbers - and propose a framework based on confident sample selection and density-based clustering to tackle them. Our experiments on both long-tailed and conventional GCD datasets demonstrate the effectiveness of our method.
View on arXiv@article{zhao2025_2506.12515, title={ Generalized Category Discovery under the Long-Tailed Distribution }, author={ Bingchen Zhao and Kai Han }, journal={arXiv preprint arXiv:2506.12515}, year={ 2025 } }