CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation

Existing metrics often lack the granularity and interpretability to capture nuanced clinical differences between candidate and ground-truth radiology reports, resulting in suboptimal evaluation. We introduce a Clinically-grounded tabular framework with Expert-curated labels and Attribute-level comparison for Radiology report evaluation (CLEAR). CLEAR not only examines whether a report can accurately identify the presence or absence of medical conditions, but also assesses whether it can precisely describe each positively identified condition across five key attributes: first occurrence, change, severity, descriptive location, and recommendation. Compared to prior works, CLEAR's multi-dimensional, attribute-level outputs enable a more comprehensive and clinically interpretable evaluation of report quality. Additionally, to measure the clinical alignment of CLEAR, we collaborate with five board-certified radiologists to develop CLEAR-Bench, a dataset of 100 chest X-ray reports from MIMIC-CXR, annotated across 6 curated attributes and 13 CheXpert conditions. Our experiments show that CLEAR achieves high accuracy in extracting clinical attributes and provides automated metrics that are strongly aligned with clinical judgment.
View on arXiv@article{jiang2025_2505.16325, title={ CLEAR: A Clinically-Grounded Tabular Framework for Radiology Report Evaluation }, author={ Yuyang Jiang and Chacha Chen and Shengyuan Wang and Feng Li and Zecong Tang and Benjamin M. Mervak and Lydia Chelala and Christopher M Straus and Reve Chahine and Samuel G. Armato III and Chenhao Tan }, journal={arXiv preprint arXiv:2505.16325}, year={ 2025 } }