Sleep profoundly affects our health, and sleep deficiency or disorders can cause physical and mental problems. Despite significant findings from previous studies, challenges persist in optimizing deep learning models, especially in multi-modal learning for high-accuracy sleep stage classification. Our research introduces MC2SleepNet (Multi-modal Cross-masking with Contrastive learning for Sleep stage classification Network). It aims to facilitate the effective collaboration between Convolutional Neural Networks (CNNs) and Transformer architectures for multi-modal training with the help of contrastive learning and cross-masking. Raw single channel EEG signals and corresponding spectrogram data provide differently characterized modalities for multi-modal learning. Our MC2SleepNet has achieved state-of-the-art performance with an accuracy of both 84.6% on the SleepEDF-78 and 88.6% accuracy on the Sleep Heart Health Study (SHHS). These results demonstrate the effective generalization of our proposed network across both small and large datasets.
View on arXiv@article{na2025_2502.17470, title={ MC2SleepNet: Multi-modal Cross-masking with Contrastive Learning for Sleep Stage Classification }, author={ Younghoon Na and Hyun Keun Ahn and Hyun-Kyung Lee and Yoongeol Lee and Seung Hun Oh and Hongkwon Kim and Jeong-Gun Lee }, journal={arXiv preprint arXiv:2502.17470}, year={ 2025 } }