ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.19548
61
0

How Syntax Specialization Emerges in Language Models

26 May 2025
Xufeng Duan
Zhaoqian Yao
Yunhao Zhang
Shaonan Wang
Zhenguang G. Cai
    MILMLRM
ArXiv (abs)PDFHTML
Main:7 Pages
12 Figures
Bibliography:3 Pages
Appendix:4 Pages
Abstract

Large language models (LLMs) have been found to develop surprising internal specializations: Individual neurons, attention heads, and circuits become selectively sensitive to syntactic structure, reflecting patterns observed in the human brain. While this specialization is well-documented, how it emerges during training and what influences its development remains largely unknown.In this work, we tap into the black box of specialization by tracking its formation over time. By quantifying internal syntactic consistency across minimal pairs from various syntactic phenomena, we identify a clear developmental trajectory: Syntactic sensitivity emerges gradually, concentrates in specific layers, and exhibits a 'critical period' of rapid internal specialization. This process is consistent across architectures and initialization parameters (e.g., random seeds), and is influenced by model scale and training data. We therefore reveal not only where syntax arises in LLMs but also how some models internalize it during training. To support future research, we will release the code, models, and training checkpoints upon acceptance.

View on arXiv
@article{duan2025_2505.19548,
  title={ How Syntax Specialization Emerges in Language Models },
  author={ Xufeng Duan and Zhaoqian Yao and Yunhao Zhang and Shaonan Wang and Zhenguang G. Cai },
  journal={arXiv preprint arXiv:2505.19548},
  year={ 2025 }
}
Comments on this paper