ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.13673
  4. Cited By
Physics of Language Models: Part 1, Learning Hierarchical Language Structures

Physics of Language Models: Part 1, Learning Hierarchical Language Structures

23 May 2023
Zeyuan Allen-Zhu
Yuanzhi Li
ArXivPDFHTML

Papers citing "Physics of Language Models: Part 1, Learning Hierarchical Language Structures"

16 / 16 papers shown
Title
Learning curves theory for hierarchically compositional data with power-law distributed features
Learning curves theory for hierarchically compositional data with power-law distributed features
Francesco Cagnetta
Hyunmo Kang
M. Wyart
36
0
0
11 May 2025
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
Scaling Laws and Representation Learning in Simple Hierarchical Languages: Transformers vs. Convolutional Architectures
Francesco Cagnetta
Alessandro Favero
Antonio Sclocchi
M. Wyart
26
0
0
11 May 2025
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
When Does Metadata Conditioning (NOT) Work for Language Model Pre-Training? A Study with Context-Free Grammars
Rei Higuchi
Ryotaro Kawata
Naoki Nishikawa
Kazusato Oko
Shoichiro Yamaguchi
Sosuke Kobayashi
Seiya Tokui
K. Hayashi
Daisuke Okanohara
Taiji Suzuki
AI4CE
35
0
0
24 Apr 2025
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
CASCADE Your Datasets for Cross-Mode Knowledge Retrieval of Language Models
Runlong Zhou
Yi Zhang
RALM
56
0
0
02 Apr 2025
Projecting Assumptions: The Duality Between Sparse Autoencoders and Concept Geometry
Sai Sumedh R. Hindupur
Ekdeep Singh Lubana
Thomas Fel
Demba Ba
42
4
0
03 Mar 2025
ICLR: In-Context Learning of Representations
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
120
3
0
29 Dec 2024
Theoretical limitations of multi-layer Transformer
Theoretical limitations of multi-layer Transformer
Lijie Chen
Binghui Peng
Hongxun Wu
AI4CE
72
6
0
04 Dec 2024
Sneaking Syntax into Transformer Language Models with Tree Regularization
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
74
0
0
28 Nov 2024
On the Role of Depth and Looping for In-Context Learning with Task
  Diversity
On the Role of Depth and Looping for In-Context Learning with Task Diversity
Khashayar Gatmiry
Nikunj Saunshi
Sashank J. Reddi
Stefanie Jegelka
Sanjiv Kumar
29
2
0
29 Oct 2024
Analyzing (In)Abilities of SAEs via Formal Languages
Analyzing (In)Abilities of SAEs via Formal Languages
Abhinav Menon
Manish Shrivastava
David M. Krueger
Ekdeep Singh Lubana
44
7
0
15 Oct 2024
Physics of Language Models: Part 3.2, Knowledge Manipulation
Physics of Language Models: Part 3.2, Knowledge Manipulation
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
19
84
0
25 Sep 2023
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Physics of Language Models: Part 3.1, Knowledge Storage and Extraction
Zeyuan Allen-Zhu
Yuanzhi Li
KELM
48
127
0
25 Sep 2023
Do Transformers Parse while Predicting the Masked Word?
Do Transformers Parse while Predicting the Masked Word?
Haoyu Zhao
A. Panigrahi
Rong Ge
Sanjeev Arora
76
31
0
14 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Neural Networks and the Chomsky Hierarchy
Neural Networks and the Chomsky Hierarchy
Grégoire Delétang
Anian Ruoss
Jordi Grau-Moya
Tim Genewein
L. Wenliang
...
Chris Cundy
Marcus Hutter
Shane Legg
Joel Veness
Pedro A. Ortega
UQCV
107
130
0
05 Jul 2022
1