Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.07647
Cited By
Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck
11 April 2024
Nathan Godey
Eric Villemonte de la Clergerie
Benoît Sagot
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck"
5 / 5 papers shown
Title
Small Language Models: Survey, Measurements, and Insights
Zhenyan Lu
Xiang Li
Dongqi Cai
Rongjie Yi
Fangming Liu
Xiwen Zhang
Nicholas D. Lane
Mengwei Xu
ObjD
LRM
58
36
0
24 Sep 2024
Norm of Mean Contextualized Embeddings Determines their Variance
Hiroaki Yamagiwa
Hidetoshi Shimodaira
27
0
0
17 Sep 2024
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency
Giovanni Puccetti
Anna Rogers
Aleksandr Drozd
F. Dell’Orletta
79
42
0
23 May 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
279
1,996
0
31 Dec 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
1