Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.13673
Cited By
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
23 May 2023
Zeyuan Allen-Zhu
Yuanzhi Li
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Physics of Language Models: Part 1, Learning Hierarchical Language Structures"
16 / 16 papers shown
Title
Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta
Hyunmo Kang
M. Wyart
36
0
0
11 May 2025
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
Francesco Cagnetta
Alessandro Favero
Antonio Sclocchi
M. Wyart
26
0
0
11 May 2025
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi
Ryotaro Kawata
Naoki Nishikawa
Kazusato Oko
Shoichiro Yamaguchi
Sosuke Kobayashi
Seiya Tokui
K. Hayashi
Daisuke Okanohara
Taiji Suzuki
AI4CE
35
0
0
24 Apr 2025
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou
Yi Zhang
RALM
56
0
0
02 Apr 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
42
4
0
03 Mar 2025
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
118
3
0
29 Dec 2024
Theoretical limitations of multi-layer Transformer
Lijie Chen
Binghui Peng
Hongxun Wu
AI4CE
72
6
0
04 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
74
0
0
28 Nov 2024
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
29
2
0
29 Oct 2024
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon
Manish Shrivastava
David M. Krueger
Ekdeep Singh Lubana
44
7
0
15 Oct 2024
Physics of Language Models: Part 3.2, Knowledge Manipulation
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
17
84
0
25 Sep 2023
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
48
127
0
25 Sep 2023
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
76
31
0
14 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
107
130
0
05 Jul 2022
1