22

OmniGCD: Abstracting Generalized Category Discovery for Modality Agnosticism

Jordan Shipard
Arnold Wiliem
Kien Nguyen Thanh
Wei Xiang
Clinton Fookes
Main:8 Pages
3 Figures
Bibliography:3 Pages
14 Tables
Appendix:6 Pages
Abstract

Generalized Category Discovery (GCD) challenges methods to identify known and novel classes using partially labeled data, mirroring human category learning. Unlike prior GCD methods, which operate within a single modality and require dataset-specific fine-tuning, we propose a modality-agnostic GCD approach inspired by the human brain's abstract category formation. Our OmniGCD\textbf{OmniGCD} leverages modality-specific encoders (e.g., vision, audio, text, remote sensing) to process inputs, followed by dimension reduction to construct a GCD latent space\textbf{GCD latent space}, which is transformed at test-time into a representation better suited for clustering using a novel synthetically trained Transformer-based model. To evaluate OmniGCD, we introduce a zero-shot GCD setting\textbf{zero-shot GCD setting} where no dataset-specific fine-tuning is allowed, enabling modality-agnostic category discovery. Trained once on synthetic data\textbf{Trained once on synthetic data}, OmniGCD performs zero-shot GCD across 16 datasets spanning four modalities, improving classification accuracy for known and novel classes over baselines (average percentage point improvement of +6.2\textbf{+6.2}, +17.9\textbf{+17.9}, +1.5\textbf{+1.5} and +12.7\textbf{+12.7} for vision, text, audio and remote sensing). This highlights the importance of strong encoders while decoupling representation learning from category discovery. Improving modality-agnostic methods will propagate across modalities, enabling encoder development independent of GCD. Our work serves as a benchmark for future modality-agnostic GCD works, paving the way for scalable, human-inspired category discovery. All code is available \href\href{this https URL}{here}

View on arXiv
Comments on this paper