Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2308.02019
Cited By
Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty
3 August 2023
I. Timiryasov
J. Tastet
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Baby Llama: knowledge distillation from an ensemble of teachers trained on a small dataset with no performance penalty"
8 / 8 papers shown
Title
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
40
108
0
10 Apr 2025
Pastiche Novel Generation Creating: Fan Fiction You Love in Your Favorite Author's Style
Xueran Han
Yuhan Liu
Mingzhe Li
Wei Liu
Sen Hu
Rui Yan
Zhiqiang Xu
Xiuying Chen
69
0
0
24 Feb 2025
BERTtime Stories: Investigating the Role of Synthetic Story Data in Language Pre-training
Nikitas Theodoropoulos
Giorgos Filandrianos
Vassilis Lyberatos
Maria Lymperaiou
Giorgos Stamou
SyDa
57
1
0
24 Feb 2025
Towards Cross-Tokenizer Distillation: the Universal Logit Distillation Loss for LLMs
Nicolas Boizard
Kevin El Haddad
C´eline Hudelot
Pierre Colombo
75
15
0
28 Jan 2025
iServe: An Intent-based Serving System for LLMs
Dimitrios Liakopoulos
Tianrui Hu
Prasoon Sinha
N. Yadwadkar
VLM
185
0
0
08 Jan 2025
InfiFusion: A Unified Framework for Enhanced Cross-Model Reasoning via LLM Fusion
Zhaoyi Yan
Zhijie Sang
Y. Zhang
Yuhao Fu
Baoyi He
Qi Zhou
Yining Di
Chunlin Ji
Shengyu Zhang
Fei Wu
MoMe
LRM
64
1
0
06 Jan 2025
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
55
5
0
31 Dec 2024
Sharpness-Aware Minimization Improves Language Model Generalization
Dara Bahri
H. Mobahi
Yi Tay
133
98
0
16 Oct 2021
1