Understanding Task Representations in Neural Networks via Bayesian Ablation

19 May 2025

Main:6 Pages

7 Figures

Bibliography:2 Pages

1 Tables

Appendix:4 Pages

Abstract

Neural networks are powerful tools for cognitive modeling due to their flexibility and emergent properties. However, interpreting their learned representations remains challenging due to their sub-symbolic semantics. In this work, we introduce a novel probabilistic framework for interpreting latent task representations in neural networks. Inspired by Bayesian inference, our approach defines a distribution over representational units to infer their causal contributions to task performance. Using ideas from information theory, we propose a suite of tools and metrics to illuminate key model properties, including representational distributedness, manifold complexity, and polysemanticity.

View on arXiv

@article{nam2025_2505.13742,
  title={ Understanding Task Representations in Neural Networks via Bayesian Ablation },
  author={ Andrew Nam and Declan Campbell and Thomas Griffiths and Jonathan Cohen and Sarah-Jane Leslie },
  journal={arXiv preprint arXiv:2505.13742},
  year={ 2025 }
}

Comments on this paper