ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.07070
  4. Cited By
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures

Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures

11 May 2025
Francesco Cagnetta
Alessandro Favero
Antonio Sclocchi
Matthieu Wyart
ArXiv (abs)PDFHTML

Papers citing "Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures"

23 / 23 papers shown
Title
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
How Compositional Generalization and Creativity Improve as Diffusion Models are Trained
Alessandro Favero
Antonio Sclocchi
Francesco Cagnetta
Pascal Frossard
Matthieu Wyart
DiffMCoGe
83
6
0
17 Feb 2025
A distributional simplicity bias in the learning dynamics of transformers
A distributional simplicity bias in the learning dynamics of transformers
Riccardo Rende
Federica Gerace
Alessandro Laio
Sebastian Goldt
120
9
0
17 Feb 2025
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Probing the Latent Hierarchical Structure of Data via Diffusion Models
Antonio Sclocchi
Alessandro Favero
Noam Itzhak Levi
Matthieu Wyart
DiffM
95
5
0
17 Oct 2024
How Feature Learning Can Improve Neural Scaling Laws
How Feature Learning Can Improve Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
109
17
0
26 Sep 2024
How transformers learn structured data: insights from hierarchical filtering
How transformers learn structured data: insights from hierarchical filtering
Jerome Garnier-Brun
Marc Mézard
Emanuele Moscato
Luca Saglietti
134
6
0
27 Aug 2024
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
What Languages are Easy to Language-Model? A Perspective from Learning Probabilistic Regular Languages
Nadav Borenstein
Anej Svete
R. Chan
Josef Valvoda
Franz Nowak
Isabelle Augenstein
Eleanor Chodroff
Ryan Cotterell
77
13
0
06 Jun 2024
Transformers represent belief state geometry in their residual stream
Transformers represent belief state geometry in their residual stream
A. Shai
Sarah E. Marzen
Lucas Teixeira
Alexander Gietelink Oldenziel
P. Riechers
AI4CE
67
17
0
24 May 2024
A Dynamical Model of Neural Scaling Laws
A Dynamical Model of Neural Scaling Laws
Blake Bordelon
Alexander B. Atanasov
Cengiz Pehlevan
96
44
0
02 Feb 2024
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
How Two-Layer Neural Networks Learn, One (Giant) Step at a Time
Yatin Dandi
Florent Krzakala
Bruno Loureiro
Luca Pesce
Ludovic Stephan
MLT
95
29
0
29 May 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Zeyuan Allen-Zhu
Yuanzhi Li
112
21
0
23 May 2023
Learning Single-Index Models with Shallow Neural Networks
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
203
71
0
27 Oct 2022
High-dimensional Asymptotics of Feature Learning: How One Gradient Step
  Improves the Representation
High-dimensional Asymptotics of Feature Learning: How One Gradient Step Improves the Representation
Jimmy Ba
Murat A. Erdogdu
Taiji Suzuki
Zhichao Wang
Denny Wu
Greg Yang
MLT
91
129
0
03 May 2022
Training Compute-Optimal Large Language Models
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
208
1,987
0
29 Mar 2022
Locality defeats the curse of dimensionality in convolutional
  teacher-student scenarios
Locality defeats the curse of dimensionality in convolutional teacher-student scenarios
Alessandro Favero
Francesco Cagnetta
Matthieu Wyart
79
31
0
16 Jun 2021
Explaining Neural Scaling Laws
Explaining Neural Scaling Laws
Yasaman Bahri
Ethan Dyer
Jared Kaplan
Jaehoon Lee
Utkarsh Sharma
78
269
0
12 Feb 2021
Learning Curve Theory
Learning Curve Theory
Marcus Hutter
218
64
0
08 Feb 2021
Geometric compression of invariant manifolds in neural nets
Geometric compression of invariant manifolds in neural nets
J. Paccolat
Leonardo Petrini
Mario Geiger
Kevin Tyloo
Matthieu Wyart
MLT
103
36
0
22 Jul 2020
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural
  Networks
Spectrum Dependent Learning Curves in Kernel Regression and Wide Neural Networks
Blake Bordelon
Abdulkadir Canatar
Cengiz Pehlevan
256
208
0
07 Feb 2020
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
611
4,921
0
23 Jan 2020
On Lazy Training in Differentiable Programming
On Lazy Training in Differentiable Programming
Lénaïc Chizat
Edouard Oyallon
Francis R. Bach
111
840
0
19 Dec 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLMSSLSSeg
1.8K
95,229
0
11 Oct 2018
Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILawVLM
641
13,432
0
25 Aug 2014
A Convolutional Neural Network for Modelling Sentences
A Convolutional Neural Network for Modelling Sentences
Nal Kalchbrenner
Edward Grefenstette
Phil Blunsom
109
3,559
0
08 Apr 2014
1