Effective generation of structured code comments requires robust quality metrics for dataset curation, yet existing approaches (SIDE, MIDQ, STASIS) suffer from limited code-comment analysis. We propose CIDRe, a language-agnostic reference-free quality criterion combining four synergistic aspects: (1) relevance (code-comment semantic alignment), (2) informativeness (functional coverage), (3) completeness (presence of all structure sections), and (4) description length (detail sufficiency). We validate our criterion on a manually annotated dataset. Experiments demonstrate CIDRe's superiority over existing metrics, achieving improvement in cross-entropy evaluation. When applied to filter comments, the models finetuned on CIDRe-filtered data show statistically significant quality gains in GPT-4o-mini assessments.
View on arXiv@article{dziuba2025_2505.19757, title={ CIDRe: A Reference-Free Multi-Aspect Criterion for Code Comment Quality Measurement }, author={ Maria Dziuba and Valentin Malykh }, journal={arXiv preprint arXiv:2505.19757}, year={ 2025 } }