Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty

3 August 2023

Papers citing "Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty"

8 / 8 papers shown

Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Alex Warstadt Aaron Mueller Leshem Choshen E. Wilcox Chengxu Zhuang ... Rafael Mosquera Bhargavi Paranjape Adina Williams Tal Linzen Ryan Cotterell 40 108 0 10 Apr 2025
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style Xueran Han Yuhan Liu Mingzhe Li Wei Liu Sen Hu Rui Yan Zhiqiang Xu Xiuying Chen 69 0 0 24 Feb 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training Nikitas Theodoropoulos Giorgos Filandrianos Vassilis Lyberatos Maria Lymperaiou Giorgos Stamou SyDa 57 1 0 24 Feb 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs Nicolas Boizard Kevin El Haddad C´eline Hudelot Pierre Colombo 75 15 0 28 Jan 2025
iServe: An Intent-based Serving System for LLMs Dimitrios Liakopoulos Tianrui Hu Prasoon Sinha N. Yadwadkar VLM 185 0 0 08 Jan 2025
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion Zhaoyi Yan Zhijie Sang Y. Zhang Yuhao Fu Baoyi He Qi Zhou Yining Di Chunlin Ji Shengyu Zhang Fei Wu MoMe LRM 64 1 0 06 Jan 2025
GPT or BERT: why not both? Lucas Georges Gabriel Charpentier David Samuel 55 5 0 31 Dec 2024
Sharpness-Aware Minimization Improves Language Model Generalization Dara Bahri H. Mobahi Yi Tay 133 98 0 16 Oct 2021