Memori: A Persistent Memory Layer for Efficient, Context-Aware LLM Agents

20 March 2026

Luiz C. Borro

Luiz A. B. Macarini

Gordon Tindall

Michael Montero

Adam B. Struck

RALM

KELM

ArXiv (abs)PDF HTML Github (12465★)

Main:6 Pages

2 Figures

4 Tables

Appendix:3 Pages

Abstract

As large language models (LLMs) evolve into autonomous agents, persistent memory at the API layer is essential for enabling context-aware behavior across LLMs and multi-session interactions. Existing approaches force vendor lock-in and rely on injecting large volumes of raw conversation into prompts, leading to high token costs and degraded performance.We introduce Memori, an LLM-agnostic persistent memory layer that treats memory as a data structuring problem. Its Advanced Augmentation pipeline converts unstructured dialogue into compact semantic triples and conversation summaries, enabling precise retrieval and coherent reasoning.Evaluated on the LoCoMo benchmark, Memori achieves 81.95% accuracy, outperforming existing memory systems while using only 1,294 tokens per query (~5% of full context). This results in substantial cost reductions, including 67% fewer tokens than competing approaches and over 20x savings compared to full-context methods.These results show that effective memory in LLM agents depends on structured representations instead of larger context windows, enabling scalable and cost-efficient deployment.

View on arXiv

Comments on this paper