Metric Similarity and Manifold Learning of Circular Dichroism Spectra of Proteins

Abstract
We present a machine learning analysis of circular dichroism spectra of globular proteins from the SP175 database, using the optimal transport-based -Wasserstein distance (with order ) and the manifold learning algorithm -SNE. Our results demonstrate that is consistent with both Euclidean and Manhattan metrics while exhibiting robustness to noise. On the other hand, -SNE uncovers meaningful structure in the high-dimensional data. The clustering in the -SNE embedding is primarily determined by proteins with distinct secondary structure compositions: one cluster predominantly contains -rich proteins, while the other consists mainly of proteins with mixed and -helical content.
View on arXiv@article{marchetti2025_2504.19355, title={ Metric Similarity and Manifold Learning of Circular Dichroism Spectra of Proteins }, author={ Gionni Marchetti }, journal={arXiv preprint arXiv:2504.19355}, year={ 2025 } }
Comments on this paper