ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.12131
36
0

Transformer Dynamics: A neuroscientific approach to interpretability of large language models

17 February 2025
Jesseba Fernando
Grigori Guitchounts
    AI4CE
ArXivPDFHTML
Abstract

As artificial intelligence models have exploded in scale and capability, understanding of their internal mechanisms remains a critical challenge. Inspired by the success of dynamical systems approaches in neuroscience, here we propose a novel framework for studying computations in deep learning systems. We focus on the residual stream (RS) in transformer models, conceptualizing it as a dynamical system evolving across layers. We find that activations of individual RS units exhibit strong continuity across layers, despite the RS being a non-privileged basis. Activations in the RS accelerate and grow denser over layers, while individual units trace unstable periodic orbits. In reduced-dimensional spaces, the RS follows a curved trajectory with attractor-like dynamics in the lower layers. These insights bridge dynamical systems theory and mechanistic interpretability, establishing a foundation for a "neuroscience of AI" that combines theoretical rigor with large-scale data analysis to advance our understanding of modern neural networks.

View on arXiv
@article{fernando2025_2502.12131,
  title={ Transformer Dynamics: A neuroscientific approach to interpretability of large language models },
  author={ Jesseba Fernando and Grigori Guitchounts },
  journal={arXiv preprint arXiv:2502.12131},
  year={ 2025 }
}
Comments on this paper