ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2504.08165
38
108

Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

10 April 2025
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
Juan Ciro
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
ArXivPDFHTML
Abstract

Children can acquire language from less than 100 million words of input. Large language models are far less data-efficient: they typically require 3 or 4 orders of magnitude more data and still do not perform as well as humans on many evaluations. These intensive resource demands limit the ability of researchers to train new models and use existing models as developmentally plausible cognitive models. The BabyLM Challenge is a communal effort in which participants compete to optimize language model training on a fixed data budget. Submissions are compared on various evaluation tasks targeting grammatical ability, downstream task performance, and generalization. Participants can submit to up to three tracks with progressively looser data restrictions. From over 30 submissions, we extract concrete recommendations on how best to train data-efficient language models, and on where future efforts should (and perhaps should not) focus. The winning submissions using the LTG-BERT architecture (Samuel et al., 2023) outperformed models trained on trillions of words. Other submissions achieved strong results through training on shorter input sequences or training a student model on a pretrained teacher. Curriculum learning attempts, which accounted for a large number of submissions, were largely unsuccessful, though some showed modest improvements.

View on arXiv
@article{warstadt2025_2504.08165,
  title={ Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora },
  author={ Alex Warstadt and Aaron Mueller and Leshem Choshen and Ethan Wilcox and Chengxu Zhuang and Juan Ciro and Rafael Mosquera and Bhargavi Paranjape and Adina Williams and Tal Linzen and Ryan Cotterell },
  journal={arXiv preprint arXiv:2504.08165},
  year={ 2025 }
}
Comments on this paper