A Proposed S.C.O.R.E. Evaluation Framework for Large Language Models : Safety, Consensus, Objectivity, Reproducibility and Explainability
Ting Fang Tan
Kabilan Elangovan
J. Ong
Nigam Shah
J. Sung
T. Y. Wong
Lan Xue
Nan Liu
Haibo Wang
Chang Fu Kuo
Simon Chesterman
Zee Kin Yeong
Daniel Ting

Abstract
A comprehensive qualitative evaluation framework for large language models (LLM) in healthcare that expands beyond traditional accuracy and quantitative metrics needed. We propose 5 key aspects for evaluation of LLMs: Safety, Consensus, Objectivity, Reproducibility and Explainability (S.C.O.R.E.). We suggest that S.C.O.R.E. may form the basis for an evaluation framework for future LLM-based models that are safe, reliable, trustworthy, and ethical for healthcare and clinical applications.
View on arXivComments on this paper