Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.18741
Cited By
Grokking of Hierarchical Structure in Vanilla Transformers
30 May 2023
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Grokking of Hierarchical Structure in Vanilla Transformers"
6 / 6 papers shown
Title
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
124
10
0
20 Feb 2025
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
100
0
0
28 Nov 2024
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
Yangsibo Huang
Chiyuan Zhang
Da Yu
Xinyun Chen
Bill Yuchen Lin
Bo Li
Badih Ghazi
Ravi Kumar
LRM
74
33
0
30 Oct 2024
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
68
81
0
03 Oct 2022
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
44
266
0
16 Jun 2019
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
51
733
0
20 Aug 2016
1