Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.07070
Cited By
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
11 May 2025
Francesco Cagnetta
Alessandro Favero
Antonio Sclocchi
Matthieu Wyart
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures"
23 / 23 papers shown
Title
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
Alessandro Favero
Antonio Sclocchi
Francesco Cagnetta
Pascal Frossard
Matthieu Wyart
DiffM
CoGe
83
6
0
17 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
120
9
0
17 Feb 2025
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi
Alessandro Favero
Noam Itzhak Levi
Matthieu Wyart
DiffM
98
5
0
17 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
109
17
0
26 Sep 2024
How transformers learn structured data: insights from hierarchical filtering
Jerome Garnier-Brun
Marc Mézard
Emanuele Moscato
Luca Saglietti
134
6
0
27 Aug 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Ryan Cotterell
77
13
0
06 Jun 2024
Transformers represent belief state geometry in their residual stream
A. Shai
Sarah E. Marzen
Lucas Teixeira
Alexander Gietelink Oldenziel
P. Riechers
AI4CE
67
17
0
24 May 2024
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
96
44
0
02 Feb 2024
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
95
29
0
29 May 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Zeyuan Allen-Zhu
Yuanzhi Li
112
21
0
23 May 2023
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
203
71
0
27 Oct 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
91
129
0
03 May 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,987
0
29 Mar 2022
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
Alessandro Favero
Francesco Cagnetta
Matthieu Wyart
79
31
0
16 Jun 2021
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
78
269
0
12 Feb 2021
Learning Curve Theory
Marcus Hutter
218
64
0
08 Feb 2021
Geometric compression of invariant manifolds in neural nets
J. Paccolat
Leonardo Petrini
Mario Geiger
Kevin Tyloo
Matthieu Wyart
MLT
103
36
0
22 Jul 2020
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Blake Bordelon
Abdulkadir Canatar
Cengiz Pehlevan
258
208
0
07 Feb 2020
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
611
4,921
0
23 Jan 2020
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
641
13,432
0
25 Aug 2014
A Convolutional Neural Network for Modelling Sentences
Nal Kalchbrenner
Edward Grefenstette
Phil Blunsom
109
3,559
0
08 Apr 2014
1