We introduce two reference-free metrics for quality evaluation of taxonomies. The first metric evaluates robustness by calculating the correlation between semantic and taxonomic similarity, covering a type of error not handled by existing metrics. The second uses Natural Language Inference to assess logical adequacy. Both metrics are tested on five taxonomies and are shown to correlate well with F1 against gold-standard taxonomies.
View on arXiv@article{wullschleger2025_2505.11470, title={ No Gold Standard, No Problem: Reference-Free Evaluation of Taxonomies }, author={ Pascal Wullschleger and Majid Zarharan and Donnacha Daly and Marc Pouly and Jennifer Foster }, journal={arXiv preprint arXiv:2505.11470}, year={ 2025 } }