Relational databases are central to modern data management, yet most data exists in unstructured forms like text documents. To bridge this gap, we leverage large language models (LLMs) to automatically synthesize a relational database by generating its schema and populating its tables from raw text. We introduce SQUiD, a novel neurosymbolic framework that decomposes this task into four stages, each with specialized techniques. Our experiments show that SQUiD consistently outperforms baselines across diverse datasets.
View on arXiv@article{sadia2025_2505.19025, title={ SQUiD: Synthesizing Relational Databases from Unstructured Text }, author={ Mushtari Sadia and Zhenning Yang and Yunming Xiao and Ang Chen and Amrita Roy Chowdhury }, journal={arXiv preprint arXiv:2505.19025}, year={ 2025 } }