133
v1v2 (latest)

AgentLeak: A Full-Stack Benchmark for Privacy Leakage in Multi-Agent LLM Systems

Faouzi El Yagoubi
Godwin Badu-Marfo
Ranwa Al Mallah
Main:15 Pages
11 Figures
Bibliography:2 Pages
4 Tables
Abstract

Multi-agent Large Language Model (LLM) systems create privacy risks that current benchmarks cannot measure. When agents coordinate on tasks, sensitive data passes through inter-agent messages, shared memory, and tool arguments, all pathways that output-only audits never inspect. We introduce AgentLeak, to the best of our knowledge the first full-stack benchmark for privacy leakage covering internal channels. It spans 1,000 scenarios across healthcare, finance, legal, and corporate domains, paired with a 32-class attack taxonomy and a three-tier detection pipeline. A factorial evaluation crossing five production LLMs (GPT-4o, GPT-4o-mini, Claude 3.5 Sonnet, Mistral Large, and Llama 3.3 70B) with all 1,000 scenarios, yielding 4,979 validated execution traces, reveals that multi-agent configurations reduce per-channel output leakage (C1: 27.2\% vs 43.2\% in single-agent) but introduce unmonitored internal channels that raise total system exposure to 68.9\% (aggregated across C1, C2, C5). Internal channels account for most of this gap: inter-agent messages (C2) leak at 68.8\%, compared to 27.2\% on C1 (output channel). This means that output-only audits miss 41.7\% of violations. Safety-aligned models achieve lower leakage on both external and internal channels, yet no model eliminates it. Across all five models and four domains, the pattern C2 \geq C1 holds consistently, confirming that inter-agent communication is the primary vulnerability. These results establish that output-only auditing is fundamentally insufficient for multi-agent systems and that privacy controls must be extended to inter-agent communication channels.

View on arXiv
Comments on this paper