67
0

A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma

Abstract

Oral squamous cell carcinoma OSCC is a major global health burden, particularly in several regions across Asia, Africa, and South America, where it accounts for a significant proportion of cancer cases. Early detection dramatically improves outcomes, with stage I cancers achieving up to 90 percent survival. However, traditional diagnosis based on histopathology has limited accessibility in low-resource settings because it is invasive, resource-intensive, and reliant on expert pathologists. On the other hand, oral cytology of brush biopsy offers a minimally invasive and lower cost alternative, provided that the remaining challenges, inter observer variability and unavailability of expert pathologists can be addressed using artificial intelligence. Development and validation of robust AI solutions requires access to large, labeled, and multi-source datasets to train high capacity models that generalize across domain shifts. We introduce the first large and multicenter oral cytology dataset, comprising annotated slides stained with Papanicolaou(PAP) and May-Grunwald-Giemsa(MGG) protocols, collected from ten tertiary medical centers in India. The dataset is labeled and annotated by expert pathologists for cellular anomaly classification and detection, is designed to advance AI driven diagnostic methods. By filling the gap in publicly available oral cytology datasets, this resource aims to enhance automated detection, reduce diagnostic errors, and improve early OSCC diagnosis in resource-constrained settings, ultimately contributing to reduced mortality and better patient outcomes worldwide.

View on arXiv
@article{jain2025_2506.09661,
  title={ A Cytology Dataset for Early Detection of Oral Squamous Cell Carcinoma },
  author={ Garima Jain and Sanghamitra Pati and Mona Duggal and Amit Sethi and Abhijeet Patil and Gururaj Malekar and Nilesh Kowe and Jitender Kumar and Jatin Kashyap and Divyajeet Rout and Deepali and Hitesh and Nishi Halduniya and Sharat Kumar and Heena Tabassum and Rupinder Singh Dhaliwal and Sucheta Devi Khuraijam and Sushma Khuraijam and Sharmila Laishram and Simmi Kharb and Sunita Singh and K. Swaminadtan and Ranjana Solanki and Deepika Hemranjani and Shashank Nath Singh and Uma Handa and Manveen Kaur and Surinder Singhal and Shivani Kalhan and Rakesh Kumar Gupta and Ravi. S and D. Pavithra and Sunil Kumar Mahto and Arvind Kumar and Deepali Tirkey and Saurav Banerjee and L. Sreelakshmi },
  journal={arXiv preprint arXiv:2506.09661},
  year={ 2025 }
}
Comments on this paper