A Grounded Memory System For Smart Personal Assistants

A wide variety of agentic AI applications - ranging from cognitive assistants for dementia patients to robotics - demand a robust memory system grounded in reality. In this paper, we propose such a memory system consisting of three components. First, we combine Vision Language Models for image captioning and entity disambiguation with Large Language Models for consistent information extraction during perception. Second, the extracted information is represented in a memory consisting of a knowledge graph enhanced by vector embeddings to efficiently manage relational information. Third, we combine semantic search and graph query generation for question answering via Retrieval Augmented Generation. We illustrate the system's working and potential using a real-world example.
View on arXiv@article{ocker2025_2505.06328, title={ A Grounded Memory System For Smart Personal Assistants }, author={ Felix Ocker and Jörg Deigmöller and Pavel Smirnov and Julian Eggert }, journal={arXiv preprint arXiv:2505.06328}, year={ 2025 } }