Addressing the Scarcity of Benchmarks for Graph XAI

While Graph Neural Networks (GNNs) have become the de facto model for learning from structured data, their decisional process remains opaque to the end user, undermining their deployment in safety-critical applications. In the case of graph classification, Explainable Artificial Intelligence (XAI) techniques address this major issue by identifying sub-graph motifs that explain predictions. However, advancements in this field are hindered by a chronic scarcity of benchmark datasets with known ground-truth motifs to assess the explanations' quality. Current graph XAI benchmarks are limited to synthetic data or a handful of real-world tasks hand-curated by domain experts. In this paper, we propose a general method to automate the construction of XAI benchmarks for graph classification from real-world datasets. We provide both 15 ready-made benchmarks, as well as the code to generate more than 2000 additional XAI benchmarks with our method. As a use case, we employ our benchmarks to assess the effectiveness of some popular graph explainers.
View on arXiv@article{fontanesi2025_2505.12437, title={ Addressing the Scarcity of Benchmarks for Graph XAI }, author={ Michele Fontanesi and Alessio Micheli and Marco Podda and Domenico Tortorella }, journal={arXiv preprint arXiv:2505.12437}, year={ 2025 } }