28
0

Metric Similarity and Manifold Learning of Circular Dichroism Spectra of Proteins

Abstract

We present a machine learning analysis of circular dichroism spectra of globular proteins from the SP175 database, using the optimal transport-based 11-Wasserstein distance W1\mathcal{W}_1 (with order p=1p=1) and the manifold learning algorithm tt-SNE. Our results demonstrate that W1\mathcal{W}_1 is consistent with both Euclidean and Manhattan metrics while exhibiting robustness to noise. On the other hand, tt-SNE uncovers meaningful structure in the high-dimensional data. The clustering in the tt-SNE embedding is primarily determined by proteins with distinct secondary structure compositions: one cluster predominantly contains β\beta-rich proteins, while the other consists mainly of proteins with mixed α/β\alpha/\beta and α\alpha-helical content.

View on arXiv
@article{marchetti2025_2504.19355,
  title={ Metric Similarity and Manifold Learning of Circular Dichroism Spectra of Proteins },
  author={ Gionni Marchetti },
  journal={arXiv preprint arXiv:2504.19355},
  year={ 2025 }
}
Comments on this paper