A distributional simplicity bias in the learning dynamics of transformers

17 February 2025

Papers citing "A distributional simplicity bias in the learning dynamics of transformers"

6 / 6 papers shown

Title
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures Francesco Cagnetta Alessandro Favero Antonio Sclocchi M. Wyart 26 0 0 11 May 2025
A Mathematical Philosophy of Explanations in Mechanistic Interpretability -- The Strange Science Part I.i Kola Ayonrinde Louis Jaburi MILM 86 1 0 01 May 2025
Shaping Shared Languages: Human and Large Language Models' Inductive Biases in Emergent Communication Tom Kouwenhoven Max Peeperkorn R. D. Kleijn Tessa Verhoef 64 0 0 06 Mar 2025
Training Dynamics of In-Context Learning in Linear Attention Yedi Zhang Aaditya K. Singh Peter E. Latham Andrew Saxe MLT 64 1 0 28 Jan 2025
How transformers learn structured data: insights from hierarchical filtering Jerome Garnier-Brun Marc Mézard Emanuele Moscato Luca Saglietti 37 5 0 27 Aug 2024
Towards a theory of how the structure of language is acquired by deep neural networks Francesco Cagnetta M. Wyart 34 8 0 28 May 2024