ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.17024
29
0

An Affective-Taxis Hypothesis for Alignment and Interpretability

3 May 2025
Eli Sennesh
Maxwell Ramstead
ArXivPDFHTML
Abstract

AI alignment is a field of research that aims to develop methods to ensure that agents always behave in a manner aligned with (i.e. consistently with) the goals and values of their human operators, no matter their level of capability. This paper proposes an affectivist approach to the alignment problem, re-framing the concepts of goals and values in terms of affective taxis, and explaining the emergence of affective valence by appealing to recent work in evolutionary-developmental and computational neuroscience. We review the state of the art and, building on this work, we propose a computational model of affect based on taxis navigation. We discuss evidence in a tractable model organism that our model reflects aspects of biological taxis navigation. We conclude with a discussion of the role of affective taxis in AI alignment.

View on arXiv
@article{sennesh2025_2505.17024,
  title={ An Affective-Taxis Hypothesis for Alignment and Interpretability },
  author={ Eli Sennesh and Maxwell Ramstead },
  journal={arXiv preprint arXiv:2505.17024},
  year={ 2025 }
}
Comments on this paper