As AI agents powered by Large Language Models (LLMs) become increasingly versatile and capable of addressing a broad spectrum of tasks, ensuring their security has become a critical challenge. Among the most pressing threats are prompt injection attacks, which exploit the agent's resilience on natural language inputs -- an especially dangerous threat when agents are granted tool access or handle sensitive information. In this work, we propose a set of principled design patterns for building AI agents with provable resistance to prompt injection. We systematically analyze these patterns, discuss their trade-offs in terms of utility and security, and illustrate their real-world applicability through a series of case studies.

View on arXiv

@article{beurer-kellner2025_2506.08837,
  title={ Design Patterns for Securing LLM Agents against Prompt Injections },
  author={ Luca Beurer-Kellner and Beat Buesser and Ana-Maria Creţu and Edoardo Debenedetti and Daniel Dobos and Daniel Fabian and Marc Fischer and David Froelicher and Kathrin Grosse and Daniel Naeff and Ezinwanne Ozoani and Andrew Paverd and Florian Tramèr and Václav Volhejn },
  journal={arXiv preprint arXiv:2506.08837},
  year={ 2025 }
}

Main:27 Pages

7 Figures

Bibliography:4 Pages

1 Tables

Appendix:1 Pages

Comments on this paper