Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence

29 October 2024

Papers citing "Choosy Babies Need One Coach: Inducing Mode-Seeking Behavior in BabyLlama with Reverse KL Divergence"

8 / 8 papers shown

Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Alex Warstadt Aaron Mueller Leshem Choshen E. Wilcox Chengxu Zhuang ... Rafael Mosquera Bhargavi Paranjape Adina Williams Tal Linzen Ryan Cotterell 190 120 0 10 Apr 2025
Can training neural language models on a curriculum with developmentally plausible data improve alignment with human reading behavior? Aryaman Chobey Oliver Smith Anzi Wang Grusha Prasad 121 5 0 30 Nov 2023
CLIMB: Curriculum Learning for Infant-inspired Model Building Richard Diehl Martinez Zébulon Goriely Hope McGovern Christopher Davis Andrew Caines P. Buttery Lisa Beinborn 77 13 0 15 Nov 2023
Forward and inverse reinforcement learning sharing network weights and hyperparameters E. Uchibe Kenji Doya 54 18 0 17 Aug 2020
BLiMP: The Benchmark of Linguistic Minimal Pairs for English Alex Warstadt Alicia Parrish Haokun Liu Anhad Mohananey Wei Peng Sheng-Fu Wang Samuel R. Bowman 100 495 0 02 Dec 2019
SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems Alex Jinpeng Wang Yada Pruksachatkun Nikita Nangia Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 279 2,326 0 02 May 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 1.8K 95,324 0 11 Oct 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 1.1K 7,201 0 20 Apr 2018