Advantage Alignment Algorithms

20 June 2024

Gauthier Gidel

Aaron Courville

Abstract

Artificially intelligent agents are increasingly being integrated into human decision-making: from large language model (LLM) assistants to autonomous vehicles. These systems often optimize their individual objective, leading to conflicts, particularly in general-sum games where naive reinforcement learning agents empirically converge to Pareto-suboptimal Nash equilibria. To address this issue, opponent shaping has emerged as a paradigm for finding socially beneficial equilibria in general-sum games. In this work, we introduce Advantage Alignment, a family of algorithms derived from first principles that perform opponent shaping efficiently and intuitively. We achieve this by aligning the advantages of interacting agents, increasing the probability of mutually beneficial actions when their interaction has been positive. We prove that existing opponent shaping methods implicitly perform Advantage Alignment. Compared to these methods, Advantage Alignment simplifies the mathematical formulation of opponent shaping, reduces the computational burden and extends to continuous action domains. We demonstrate the effectiveness of our algorithms across a range of social dilemmas, achieving state-of-the-art cooperation and robustness against exploitation.

View on arXiv

@article{duque2025_2406.14662,
  title={ Advantage Alignment Algorithms },
  author={ Juan Agustin Duque and Milad Aghajohari and Tim Cooijmans and Razvan Ciuca and Tianyu Zhang and Gauthier Gidel and Aaron Courville },
  journal={arXiv preprint arXiv:2406.14662},
  year={ 2025 }
}

Comments on this paper