Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models

Information extraction (IE) plays a crucial role in natural language processing (NLP) by converting unstructured text into structured knowledge. Deploying computationally intensive large language models (LLMs) on resource-constrained devices for information extraction is challenging, particularly due to issues like hallucinations, limited context length, and high latency-especially when handling diverse extraction schemas. To address these challenges, we propose a two-stage information extraction approach adapted for on-device LLMs, called Dual-LoRA with Incremental Schema Caching (DLISC), which enhances both schema identification and schema-aware extraction in terms of effectiveness and efficiency. In particular, DLISC adopts an Identification LoRA module for retrieving the most relevant schemas to a given query, and an Extraction LoRA module for performing information extraction based on the previously selected schemas. To accelerate extraction inference, Incremental Schema Caching is incorporated to reduce redundant computation, substantially improving efficiency. Extensive experiments across multiple information extraction datasets demonstrate notable improvements in both effectiveness and efficiency.
View on arXiv@article{wen2025_2505.14992, title={ Effective and Efficient Schema-aware Information Extraction Using On-Device Large Language Models }, author={ Zhihao Wen and Sheng Liang and Yaxiong Wu and Yongyue Zhang and Yong Liu }, journal={arXiv preprint arXiv:2505.14992}, year={ 2025 } }