OntoRAG: Enhancing Question-Answering through Automated Ontology Derivation from Unstructured Knowledge Bases

Ontologies are pivotal for structuring knowledge bases to enhance question answering (QA) systems powered by Large Language Models (LLMs). However, traditional ontology creation relies on manual efforts by domain experts, a process that is time intensive, error prone, and impractical for large, dynamic knowledge domains. This paper introduces OntoRAG, an automated pipeline designed to derive ontologies from unstructured knowledge bases, with a focus on electrical relay documents. OntoRAG integrates advanced techniques, including web scraping, PDF parsing, hybrid chunking, information extraction, knowledge graph construction, and ontology creation, to transform unstructured data into a queryable ontology. By leveraging LLMs and graph based methods, OntoRAG enhances global sensemaking capabilities, outperforming conventional Retrieval Augmented Generation (RAG) and GraphRAG approaches in comprehensiveness and diversity. Experimental results demonstrate OntoRAGs effectiveness, achieving a comprehensiveness win rate of 85% against vector RAG and 75% against GraphRAGs best configuration. This work addresses the critical challenge of automating ontology creation, advancing the vision of the semantic web.
View on arXiv@article{tiwari2025_2506.00664, title={ OntoRAG: Enhancing Question-Answering through Automated Ontology Derivation from Unstructured Knowledge Bases }, author={ Yash Tiwari and Owais Ahmad Lone and Mayukha Pal }, journal={arXiv preprint arXiv:2506.00664}, year={ 2025 } }