Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments

13 February 2022

Papers citing "Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments"

21 / 21 papers shown

Title
Beyond Words: A Latent Memory Approach to Internal Reasoning in LLMs José I. Orlicki LRM 61 0 0 28 Feb 2025
(Mis)Fitting: A Survey of Scaling Laws Margaret Li Sneha Kudugunta Luke Zettlemoyer 69 2 0 26 Feb 2025
Predicting Emergent Capabilities by Finetuning Charlie Snell Eric Wallace Dan Klein Sergey Levine ELM LRM 79 5 0 25 Nov 2024
A Hitchhiker's Guide to Scaling Law Estimation Leshem Choshen Yang Zhang Jacob Andreas 43 6 0 15 Oct 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models Tomer Porian Mitchell Wortsman J. Jitsev Ludwig Schmidt Y. Carmon 60 20 0 27 Jun 2024
Benchmarking Neural Decoding Backbones towards Enhanced On-edge iBCI Applications Zhou Zhou Guohang He Zheng Zhang Luziwei Leng Qinghai Guo Jianxing Liao Xuan Song Ran Cheng 47 2 0 08 Jun 2024
Language models scale reliably with over-training and on downstream tasks S. Gadre Georgios Smyrnis Vaishaal Shankar Suchin Gururangan Mitchell Wortsman ... Y. Carmon Achal Dave Reinhard Heckel Niklas Muennighoff Ludwig Schmidt ALM ELM LRM 108 40 0 13 Mar 2024
The Universal Statistical Structure and Scaling Laws of Chaos and Turbulence Noam Levi Yaron Oz AI4CE 29 1 0 02 Nov 2023
Text Rendering Strategies for Pixel Language Models Jonas F. Lotz Elizabeth Salesky Phillip Rust Desmond Elliott VLM 29 11 0 01 Nov 2023
Assessing Step-by-Step Reasoning against Lexical Negation: A Case Study on Syllogism Mengyu Ye Tatsuki Kuribayashi Jun Suzuki Goro Kobayashi Hiroaki Funayama LRM 31 8 0 23 Oct 2023
Efficient Benchmarking of Language Models Yotam Perlitz Elron Bandel Ariel Gera Ofir Arviv L. Ein-Dor Eyal Shnarch Noam Slonim Michal Shmueli-Scheuer Leshem Choshen ALM 21 24 0 22 Aug 2023
Scaling Laws Do Not Scale Fernando Diaz Michael A. Madaio 23 8 0 05 Jul 2023
The Underlying Scaling Laws and Universal Statistical Structure of Complex Datasets Noam Levi Yaron Oz 27 4 0 26 Jun 2023
ColD Fusion: Collaborative Descent for Distributed Multitask Finetuning Shachar Don-Yehiya Elad Venezian Colin Raffel Noam Slonim Yoav Katz Leshem Choshen MoMe 28 52 0 02 Dec 2022
Evaluating the Impact of Model Scale for Compositional Generalization in Semantic Parsing Linlu Qiu Peter Shaw Panupong Pasupat Tianze Shi Jonathan Herzig Emily Pitler Fei Sha Kristina Toutanova AI4CE LRM 33 52 0 24 May 2022
Scale Efficiently: Insights from Pre-training and Fine-tuning Transformers Yi Tay Mostafa Dehghani J. Rao W. Fedus Samira Abnar Hyung Won Chung Sharan Narang Dani Yogatama Ashish Vaswani Donald Metzler 206 110 0 22 Sep 2021
The Grammar-Learning Trajectories of Neural Language Models Leshem Choshen Guy Hacohen D. Weinshall Omri Abend 29 28 0 13 Sep 2021
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks Blake Bordelon Abdulkadir Canatar C. Pehlevan 144 201 0 07 Feb 2020
Scaling Laws for Neural Language Models Jared Kaplan Sam McCandlish T. Henighan Tom B. Brown B. Chess R. Child Scott Gray Alec Radford Jeff Wu Dario Amodei 258 4,489 0 23 Jan 2020
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding Alex Jinpeng Wang Amanpreet Singh Julian Michael Felix Hill Omer Levy Samuel R. Bowman ELM 297 6,959 0 20 Apr 2018
Neural Architecture Search with Reinforcement Learning Barret Zoph Quoc V. Le 271 5,329 0 05 Nov 2016