Grokking of Hierarchical Structure in Vanilla Transformers

30 May 2023

Papers citing "Grokking of Hierarchical Structure in Vanilla Transformers"

6 / 6 papers shown

Title
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations Yize Zhao Tina Behnia V. Vakilian Christos Thrampoulidis 124 10 0 20 Feb 2025
Sneaking Syntax into Transformer Language Models with Tree Regularization Ananjan Nandi Christopher D. Manning Shikhar Murty 100 0 0 28 Nov 2024
On Memorization of Large Language Models in Logical Reasoning Chulin Xie Yangsibo Huang Chiyuan Zhang Da Yu Xinyun Chen Bill Yuchen Lin Bo Li Badih Ghazi Ravi Kumar LRM 74 33 0 30 Oct 2024
Omnigrok: Grokking Beyond Algorithmic Data Ziming Liu Eric J. Michaud Max Tegmark 68 81 0 03 Oct 2022
Theoretical Limitations of Self-Attention in Neural Sequence Models Michael Hahn 44 266 0 16 Jun 2019
Using the Output Embedding to Improve Language Models Ofir Press Lior Wolf 51 733 0 20 Aug 2016