ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.06609
14
0
v1v2 (latest)

Transferring Features Across Language Models With Model Stitching

7 June 2025
Alan Chen
Jack Merullo
Alessandro Stolfo
Ellie Pavlick
ArXiv (abs)PDFHTML
Main:9 Pages
17 Figures
Bibliography:5 Pages
8 Tables
Appendix:12 Pages
Abstract

In this work, we demonstrate that affine mappings between residual streams of language models is a cheap way to effectively transfer represented features between models. We apply this technique to transfer the weights of Sparse Autoencoders (SAEs) between models of different sizes to compare their representations. We find that small and large models learn highly similar representation spaces, which motivates training expensive components like SAEs on a smaller model and transferring to a larger model at a FLOPs savings. For example, using a small-to-large transferred SAE as initialization can lead to 50% cheaper training runs when training SAEs on larger models. Next, we show that transferred probes and steering vectors can effectively recover ground truth performance. Finally, we dive deeper into feature-level transferability, finding that semantic and structural features transfer noticeably differently while specific classes of functional features have their roles faithfully mapped. Overall, our findings illustrate similarities and differences in the linear representation spaces of small and large models and demonstrate a method for improving the training efficiency of SAEs.

View on arXiv
@article{chen2025_2506.06609,
  title={ Transferring Features Across Language Models With Model Stitching },
  author={ Alan Chen and Jack Merullo and Alessandro Stolfo and Ellie Pavlick },
  journal={arXiv preprint arXiv:2506.06609},
  year={ 2025 }
}
Comments on this paper