29
0

OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit

Abstract

We present OnPrem..LLM, a Python-based toolkit for applying large language models (LLMs) to sensitive, non-public data in offline or restricted environments. The system is designed for privacy-preserving use cases and provides prebuilt pipelines for document processing and storage, retrieval-augmented generation (RAG), information extraction, summarization, classification, and prompt/output processing with minimal configuration. OnPrem..LLM supports multiple LLM backends -- including llama..cpp, Ollama, vLLM, and Hugging Face Transformers -- with quantized model support, GPU acceleration, and seamless backend switching. Although designed for fully local execution, OnPrem..LLM also supports integration with a wide range of cloud LLM providers when permitted, enabling hybrid deployments that balance performance with data control. A no-code web interface extends accessibility to non-technical users.

View on arXiv
@article{maiya2025_2505.07672,
  title={ OnPrem.LLM: A Privacy-Conscious Document Intelligence Toolkit },
  author={ Arun S. Maiya },
  journal={arXiv preprint arXiv:2505.07672},
  year={ 2025 }
}
Comments on this paper