Not all layers are equally as important: Every Layer Counts BERT

v1v2 (latest)

Not all layers are equally as important: Every Layer Counts BERT

3 November 2023

Lucas Georges Gabriel Charpentier

ArXiv (abs)PDF HTML

Papers citing "Not all layers are equally as important: Every Layer Counts BERT"

11 / 11 papers shown

Title
Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning Wesley Scivetti Tatsuya Aoyama Ethan Wilcox Nathan Schneider 50 0 0 04 Jun 2025
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models Lennart Stöpler Rufat Asadli Mitja Nikolaus Ryan Cotterell Alex Warstadt LRM 82 2 0 09 May 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Alex Warstadt Aaron Mueller Leshem Choshen E. Wilcox Chengxu Zhuang ... Rafael Mosquera Bhargavi Paranjape Adina Williams Tal Linzen Ryan Cotterell 202 121 0 10 Apr 2025
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context Alexis Matzopoulos Charl Hendriks Hishaam Mahomed Francois Meyer 126 0 0 08 Jan 2025
GPT or BERT: why not both? Lucas Georges Gabriel Charpentier David Samuel 156 5 0 31 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora Michael Y. Hu Aaron Mueller Candace Ross Adina Williams Tal Linzen Chengxu Zhuang Ryan Cotterell Leshem Choshen Alex Warstadt Ethan Gotlieb Wilcox 180 14 0 06 Dec 2024
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes Zébulon Goriely Richard Diehl Martinez Andrew Caines Lisa Beinborn P. Buttery CLL 106 5 0 30 Oct 2024
Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword Tokenization Zilong Li 68 0 0 19 Oct 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning Boyao Wang Xiang Liu Shizhe Diao Renjie Pi Jipeng Zhang Chi Han Tong Zhang 106 55 0 26 Mar 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging Matteo Pagliardini Amirkeivan Mohtashami François Fleuret Martin Jaggi 106 9 0 04 Feb 2024
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers Anna Langedijk Hosein Mohebbi Gabriele Sarti Willem H. Zuidema Jaap Jumelet 95 12 0 05 Oct 2023