ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.18274
73
1

Studying Small Language Models with Susceptibilities

25 April 2025
Garrett Baker
George Wang
Jesse Hoogland
Daniel Murfet
    AAML
ArXivPDFHTML
Abstract

We develop a linear response framework for interpretability that treats a neural network as a Bayesian statistical mechanical system. A small, controlled perturbation of the data distribution, for example shifting the Pile toward GitHub or legal text, induces a first-order change in the posterior expectation of an observable localized on a chosen component of the network. The resulting susceptibility can be estimated efficiently with local SGLD samples and factorizes into signed, per-token contributions that serve as attribution scores. Building a set of perturbations (probes) yields a response matrix whose low-rank structure separates functional modules such as multigram and induction heads in a 3M-parameter transformer. Susceptibilities link local learning coefficients from singular learning theory with linear-response theory, and quantify how local loss landscape geometry deforms under shifts in the data distribution.

View on arXiv
@article{baker2025_2504.18274,
  title={ Studying Small Language Models with Susceptibilities },
  author={ Garrett Baker and George Wang and Jesse Hoogland and Daniel Murfet },
  journal={arXiv preprint arXiv:2504.18274},
  year={ 2025 }
}
Comments on this paper