Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems

3 June 2025

Abstract

Large Language Model-based Multi-Agent Systems (LLM-MAS) have demonstrated strong capabilities in solving complex tasks but remain vulnerable when agents receive unreliable messages. This vulnerability stems from a fundamental gap: LLM agents treat all incoming messages equally without evaluating their trustworthiness. While some existing studies approach the trustworthiness, they focus on a single type of harmfulness rather than analyze it in a holistic approach from multiple trustworthiness perspectives. In this work, we propose Attention Trust Score (A-Trust), a lightweight, attention-based method for evaluating message trustworthiness. Inspired by human communication literature[1], through systematically analyzing attention behaviors across six orthogonal trust dimensions, we find that certain attention heads in the LLM specialize in detecting specific types of violations. Leveraging these insights, A-Trust directly infers trustworthiness from internal attention patterns without requiring external prompts or verifiers. Building upon A-Trust, we develop a principled and efficient trust management system (TMS) for LLM-MAS, enabling both message-level and agent-level trust assessment. Experiments across diverse multi-agent settings and tasks demonstrate that applying our TMS significantly enhances robustness against malicious inputs.

View on arXiv

@article{he2025_2506.02546,
  title={ Attention Knows Whom to Trust: Attention-based Trust Management for LLM Multi-Agent Systems },
  author={ Pengfei He and Zhenwei Dai and Xianfeng Tang and Yue Xing and Hui Liu and Jingying Zeng and Qiankun Peng and Shrivats Agrawal and Samarth Varshney and Suhang Wang and Jiliang Tang and Qi He },
  journal={arXiv preprint arXiv:2506.02546},
  year={ 2025 }
}

Comments on this paper