20
0

How Grounded is Wikipedia? A Study on Structured Evidential Support

Main:3 Pages
4 Figures
Bibliography:3 Pages
5 Tables
Appendix:11 Pages
Abstract

Wikipedia is a critical resource for modern NLP, serving as a rich repository of up-to-date and citation-backed information on a wide variety of subjects. The reliability of Wikipedia -- its groundedness in its cited sources -- is vital to this purpose. This work provides a quantitative analysis of the extent to which Wikipedia *is* so grounded and of how readily grounding evidence may be retrieved. To this end, we introduce PeopleProfiles -- a large-scale, multi-level dataset of claim support annotations on Wikipedia articles of notable people. We show that roughly 20% of claims in Wikipedia *lead* sections are unsupported by the article body; roughly 27% of annotated claims in the article *body* are unsupported by their (publicly accessible) cited sources; and >80% of lead claims cannot be traced to these sources via annotated body evidence. Further, we show that recovery of complex grounding evidence for claims that *are* supported remains a challenge for standard retrieval methods.

View on arXiv
@article{walden2025_2506.12637,
  title={ How Grounded is Wikipedia? A Study on Structured Evidential Support },
  author={ William Walden and Kathryn Ricci and Miriam Wanner and Zhengping Jiang and Chandler May and Rongkun Zhou and Benjamin Van Durme },
  journal={arXiv preprint arXiv:2506.12637},
  year={ 2025 }
}
Comments on this paper