Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.02265
Cited By
v1
v2 (latest)
Not all layers are equally as important: Every Layer Counts BERT
3 November 2023
Lucas Georges Gabriel Charpentier
David Samuel
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Not all layers are equally as important: Every Layer Counts BERT"
11 / 11 papers shown
Title
Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning
Wesley Scivetti
Tatsuya Aoyama
Ethan Wilcox
Nathan Schneider
50
0
0
04 Jun 2025
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models
Lennart Stöpler
Rufat Asadli
Mitja Nikolaus
Ryan Cotterell
Alex Warstadt
LRM
82
2
0
09 May 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
202
121
0
10 Apr 2025
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
Alexis Matzopoulos
Charl Hendriks
Hishaam Mahomed
Francois Meyer
126
0
0
08 Jan 2025
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
156
5
0
31 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
180
14
0
06 Dec 2024
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Zébulon Goriely
Richard Diehl Martinez
Andrew Caines
Lisa Beinborn
P. Buttery
CLL
106
5
0
30 Oct 2024
Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword Tokenization
Zilong Li
68
0
0
19 Oct 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Boyao Wang
Xiang Liu
Shizhe Diao
Renjie Pi
Jipeng Zhang
Chi Han
Tong Zhang
106
55
0
26 Mar 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
François Fleuret
Martin Jaggi
106
9
0
04 Feb 2024
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
95
12
0
05 Oct 2023
1