ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.21785
51
0
v1v2 (latest)

Born a Transformer -- Always a Transformer?

27 May 2025
Yana Veitsman
Mayank Jobanputra
Yash Sarrof
Aleksandra Bakalova
Vera Demberg
Ellie Pavlick
Michael Hahn
ArXiv (abs)PDFHTML
Main:8 Pages
15 Figures
Bibliography:5 Pages
9 Tables
Appendix:17 Pages
Abstract

Transformers have theoretical limitations in modeling certain sequence-to-sequence tasks, yet it remains largely unclear if these limitations play a role in large-scale pretrained LLMs, or whether LLMs might effectively overcome these constraints in practice due to the scale of both the models themselves and their pretraining data. We explore how these architectural constraints manifest after pretraining, by studying a family of retrieval\textit{retrieval}retrieval and copying\textit{copying}copying tasks inspired by Liu et al. [2024]. We use the recently proposed C-RASP framework for studying length generalization [Huang et al., 2025b] to provide guarantees for each of our settings. Empirically, we observe an induction-versus-anti-induction\textit{induction-versus-anti-induction}induction-versus-anti-induction asymmetry, where pretrained models are better at retrieving tokens to the right (induction) rather than the left (anti-induction) of a query token. This asymmetry disappears upon targeted fine-tuning if length-generalization is guaranteed by theory. Mechanistic analysis reveals that this asymmetry is connected to the differences in the strength of induction versus anti-induction circuits within pretrained Transformers. We validate our findings through practical experiments on real-world tasks demonstrating reliability risks. Our results highlight that pretraining selectively enhances certain Transformer capabilities, but does not overcome fundamental length-generalization limits.

View on arXiv
@article{veitsman2025_2505.21785,
  title={ Born a Transformer -- Always a Transformer? },
  author={ Yana Veitsman and Mayank Jobanputra and Yash Sarrof and Aleksandra Bakalova and Vera Demberg and Ellie Pavlick and Michael Hahn },
  journal={arXiv preprint arXiv:2505.21785},
  year={ 2025 }
}
Comments on this paper