Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments

29 May 2025

Abstract

A key ethical challenge in Automated Essay Scoring (AES) is ensuring that scores are only released when they meet high reliability standards. Confidence modelling addresses this by assigning a reliability estimate measure, in the form of a confidence score, to each automated score. In this study, we frame confidence estimation as a classification task: predicting whether an AES-generated score correctly places a candidate in the appropriate CEFR level. While this is a binary decision, we leverage the inherent granularity of the scoring domain in two ways. First, we reformulate the task as an n-ary classification problem using score binning. Second, we introduce a set of novel Kernel Weighted Ordinal Categorical Cross Entropy (KWOCCE) loss functions that incorporate the ordinal structure of CEFR labels. Our best-performing model achieves an F1 score of 0.97, and enables the system to release 47% of scores with 100% CEFR agreement and 99% with at least 95% CEFR agreement -compared to approximately 92% (approx.) CEFR agreement from the standalone AES model where we release all AM predicted scores.

View on arXiv

@article{chakravarty2025_2505.23315,
  title={ Enhancing Marker Scoring Accuracy through Ordinal Confidence Modelling in Educational Assessments },
  author={ Abhirup Chakravarty and Mark Brenchley and Trevor Breakspear and Ian Lewin and Yan Huang },
  journal={arXiv preprint arXiv:2505.23315},
  year={ 2025 }
}

Comments on this paper