An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology

Chromosome analysis is vital for diagnosing genetic disorders and guiding cancer therapy decisions through the identification of somatic clonal aberrations. However, developing an AI model are hindered by the overwhelming complexity and diversity of chromosomal abnormalities, requiring extensive annotation efforts, while automated methods remain task-specific and lack generalizability due to the scarcity of comprehensive datasets spanning diverse resource conditions. Here, we introduce CHROMA, a foundation model for cytogenomics, designed to overcome these challenges by learning generalizable representations of chromosomal abnormalities. Pre-trained on over 84,000 specimens (~4 million chromosomal images) via self-supervised learning, CHROMA outperforms other methods across all types of abnormalities, even when trained on fewer labelled data and more imbalanced datasets. By facilitating comprehensive mapping of instability and clonal leisons across various aberration types, CHROMA offers a scalable and generalizable solution for reliable and automated clinical analysis, reducing the annotation workload for experts and advancing precision oncology through the early detection of rare genomic abnormalities, enabling broad clinical AI applications and making advanced genomic analysis more accessible.
View on arXiv@article{yang2025_2505.15868, title={ An Inclusive Foundation Model for Generalizable Cytogenetics in Precision Oncology }, author={ Changchun Yang and Weiqian Dai and Yilan Zhang and Siyuan Chen and Jingdong Hu and Junkai Su and Yuxuan Chen and Ao Xu and Na Li and Xin Gao and Yongguo Yu }, journal={arXiv preprint arXiv:2505.15868}, year={ 2025 } }