Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck

11 April 2024

Papers citing "Why do small language models underperform? Studying Language Model Saturation via the Softmax Bottleneck"

5 / 5 papers shown

Title
Small Language Models: Survey, Measurements, and Insights Zhenyan Lu Xiang Li Dongqi Cai Rongjie Yi Fangming Liu Xiwen Zhang Nicholas D. Lane Mengwei Xu ObjD LRM 58 36 0 24 Sep 2024
Norm of Mean Contextualized Embeddings Determines their Variance Hiroaki Yamagiwa Hidetoshi Shimodaira 27 0 0 17 Sep 2024
Outliers Dimensions that Disrupt Transformers Are Driven by Frequency Giovanni Puccetti Anna Rogers Aleksandr Drozd F. Dell’Orletta 79 42 0 23 May 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 279 1,996 0 31 Dec 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 264 4,489 0 23 Jan 2020