ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.02265
  4. Cited By
Not all layers are equally as important: Every Layer Counts BERT
v1v2 (latest)

Not all layers are equally as important: Every Layer Counts BERT

3 November 2023
Lucas Georges Gabriel Charpentier
David Samuel
ArXiv (abs)PDFHTML

Papers citing "Not all layers are equally as important: Every Layer Counts BERT"

11 / 11 papers shown
Title
Unpacking Let Alone: Human-Scale Models Generalize to a Rare Construction in Form but not Meaning
Wesley Scivetti
Tatsuya Aoyama
Ethan Wilcox
Nathan Schneider
50
0
0
04 Jun 2025
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models
Towards Developmentally Plausible Rewards: Communicative Success as a Learning Signal for Interactive Language Models
Lennart Stöpler
Rufat Asadli
Mitja Nikolaus
Ryan Cotterell
Alex Warstadt
LRM
82
2
0
09 May 2025
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Findings of the BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Alex Warstadt
Aaron Mueller
Leshem Choshen
E. Wilcox
Chengxu Zhuang
...
Rafael Mosquera
Bhargavi Paranjape
Adina Williams
Tal Linzen
Ryan Cotterell
202
121
0
10 Apr 2025
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
BabyLMs for isiXhosa: Data-Efficient Language Modelling in a Low-Resource Context
Alexis Matzopoulos
Charl Hendriks
Hishaam Mahomed
Francois Meyer
126
0
0
08 Jan 2025
GPT or BERT: why not both?
GPT or BERT: why not both?
Lucas Georges Gabriel Charpentier
David Samuel
156
5
0
31 Dec 2024
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on
  Developmentally Plausible Corpora
Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora
Michael Y. Hu
Aaron Mueller
Candace Ross
Adina Williams
Tal Linzen
Chengxu Zhuang
Ryan Cotterell
Leshem Choshen
Alex Warstadt
Ethan Gotlieb Wilcox
180
14
0
06 Dec 2024
From Babble to Words: Pre-Training Language Models on Continuous Streams
  of Phonemes
From Babble to Words: Pre-Training Language Models on Continuous Streams of Phonemes
Zébulon Goriely
Richard Diehl Martinez
Andrew Caines
Lisa Beinborn
P. Buttery
CLL
106
5
0
30 Oct 2024
Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword
  Tokenization
Team Ryu's Submission to SIGMORPHON 2024 Shared Task on Subword Tokenization
Zilong Li
68
0
0
19 Oct 2024
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language
  Model Fine-Tuning
LISA: Layerwise Importance Sampling for Memory-Efficient Large Language Model Fine-Tuning
Boyao Wang
Xiang Liu
Shizhe Diao
Renjie Pi
Jipeng Zhang
Chi Han
Tong Zhang
106
55
0
26 Mar 2024
DenseFormer: Enhancing Information Flow in Transformers via Depth
  Weighted Averaging
DenseFormer: Enhancing Information Flow in Transformers via Depth Weighted Averaging
Matteo Pagliardini
Amirkeivan Mohtashami
François Fleuret
Martin Jaggi
106
9
0
04 Feb 2024
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
95
12
0
05 Oct 2023
1