scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data

Single-cell RNA sequencing (scRNA-seq) reveals cell heterogeneity, with cell clustering playing a key role in identifying cell types and marker genes. Recent advances, especially graph neural networks (GNNs)-based methods, have significantly improved clustering performance. However, the analysis of scRNA-seq data remains challenging due to noise, sparsity, and high dimensionality. Compounding these challenges, GNNs often suffer from over-smoothing, limiting their ability to capture complex biological information. In response, we propose scSiameseClu, a novel Siamese Clustering framework for interpreting single-cell RNA-seq data, comprising of 3 key steps: (1) Dual Augmentation Module, which applies biologically informed perturbations to the gene expression matrix and cell graph relationships to enhance representation robustness; (2) Siamese Fusion Module, which combines cross-correlation refinement and adaptive information fusion to capture complex cellular relationships while mitigating over-smoothing; and (3) Optimal Transport Clustering, which utilizes Sinkhorn distance to efficiently align cluster assignments with predefined proportions while maintaining balance. Comprehensive evaluations on seven real-world datasets demonstrate that~\methodname~outperforms state-of-the-art methods in single-cell clustering, cell type annotation, and cell type classification, providing a powerful tool for scRNA-seq data interpretation.
View on arXiv@article{xu2025_2505.12626, title={ scSiameseClu: A Siamese Clustering Framework for Interpreting single-cell RNA Sequencing Data }, author={ Ping Xu and Zhiyuan Ning and Pengjiang Li and Wenhao Liu and Pengyang Wang and Jiaxu Cui and Yuanchun Zhou and Pengfei Wang }, journal={arXiv preprint arXiv:2505.12626}, year={ 2025 } }