ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.18741
  4. Cited By
Grokking of Hierarchical Structure in Vanilla Transformers

Grokking of Hierarchical Structure in Vanilla Transformers

30 May 2023
Shikhar Murty
Pratyusha Sharma
Jacob Andreas
Christopher D. Manning
ArXivPDFHTML

Papers citing "Grokking of Hierarchical Structure in Vanilla Transformers"

6 / 6 papers shown
Title
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Implicit Geometry of Next-token Prediction: From Language Sparsity Patterns to Model Representations
Yize Zhao
Tina Behnia
V. Vakilian
Christos Thrampoulidis
124
10
0
20 Feb 2025
Sneaking Syntax into Transformer Language Models with Tree Regularization
Sneaking Syntax into Transformer Language Models with Tree Regularization
Ananjan Nandi
Christopher D. Manning
Shikhar Murty
100
0
0
28 Nov 2024
On Memorization of Large Language Models in Logical Reasoning
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
Yangsibo Huang
Chiyuan Zhang
Da Yu
Xinyun Chen
Bill Yuchen Lin
Bo Li
Badih Ghazi
Ravi Kumar
LRM
74
33
0
30 Oct 2024
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
68
81
0
03 Oct 2022
Theoretical Limitations of Self-Attention in Neural Sequence Models
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
44
266
0
16 Jun 2019
Using the Output Embedding to Improve Language Models
Using the Output Embedding to Improve Language Models
Ofir Press
Lior Wolf
51
733
0
20 Aug 2016
1