How Grounded is Wikipedia? A Study on Structured Evidential Support
- HILM

Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work provides a quantitative analysis of the extent to which Wikipedia *is* so grounded and of how readily grounding evidence may be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on Wikipedia articles of notable people. We show that roughly 20% of claims in Wikipedia *lead* sections are unsupported by the article body; roughly 27% of annotated claims in the article *body* are unsupported by their (publicly accessible) cited sources; and >80% of lead claims cannot be traced to these sources via annotated body evidence. Further, we show that recovery of complex grounding evidence for claims that *are* supported remains a challenge for standard retrieval methods.
View on arXiv@article{walden2025_2506.12637, title={ How Grounded is Wikipedia? A Study on Structured Evidential Support }, author={ William Walden and Kathryn Ricci and Miriam Wanner and Zhengping Jiang and Chandler May and Rongkun Zhou and Benjamin Van Durme }, journal={arXiv preprint arXiv:2506.12637}, year={ 2025 } }