ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.11873
  4. Cited By
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense
  Subnetworks

A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks

21 March 2023
William Merrill
Nikolaos Tsilivis
Aman Shukla
ArXiv (abs)PDFHTML

Papers citing "A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks"

40 / 40 papers shown
Title
How Do Transformers Learn Variable Binding in Symbolic Programs?
How Do Transformers Learn Variable Binding in Symbolic Programs?
Yiwei Wu
Atticus Geiger
Raphaël Millière
NAI
41
1
0
27 May 2025
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model
Zhiwei Xu
Zhiyu Ni
Yixin Wang
Wei Hu
CLL
105
2
0
17 Apr 2025
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Between Circuits and Chomsky: Pre-pretraining on Formal Languages Imparts Linguistic Biases
Michael Y. Hu
Jackson Petty
Chuan Shi
William Merrill
Tal Linzen
AI4CE
135
2
0
26 Feb 2025
Mechanistic?
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
75
13
0
07 Oct 2024
Grokking at the Edge of Linear Separability
Grokking at the Edge of Linear Separability
Alon Beck
Noam Levi
Yohai Bar-Sinai
80
1
0
06 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
112
3
0
21 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent
  Phase Transition
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
84
4
0
16 Aug 2024
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Knowledge Mechanisms in Large Language Models: A Survey and Perspective
Meng Wang
Yunzhi Yao
Ziwen Xu
Shuofei Qiao
Shumin Deng
...
Yong Jiang
Pengjun Xie
Fei Huang
Huajun Chen
Ningyu Zhang
141
39
0
22 Jul 2024
Grokking Modular Polynomials
Grokking Modular Polynomials
Darshil Doshi
Tianyu He
Aritra Das
Andrey Gromov
89
5
0
05 Jun 2024
A rationale from frequency perspective for grokking in training neural
  network
A rationale from frequency perspective for grokking in training neural network
Zhangchen Zhou
Yaoyu Zhang
Z. Xu
88
2
0
24 May 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to
  the Edge of Generalization
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Boshi Wang
Xiang Yue
Yu-Chuan Su
Huan Sun
LRM
151
50
0
23 May 2024
Progress Measures for Grokking on Real-world Tasks
Progress Measures for Grokking on Real-world Tasks
Satvik Golechha
96
2
0
21 May 2024
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers
Kabir Ahuja
Vidhisha Balachandran
Madhur Panwar
Tianxing He
Noah A. Smith
Navin Goyal
Yulia Tsvetkov
106
8
0
25 Apr 2024
Mechanistic Interpretability for AI Safety -- A Review
Mechanistic Interpretability for AI Safety -- A Review
Leonard Bereska
E. Gavves
AI4CE
137
158
0
22 Apr 2024
Eigenpruning: an Interpretability-Inspired PEFT Method
Eigenpruning: an Interpretability-Inspired PEFT Method
Tomás Vergara-Browne
Álvaro Soto
A. Aizawa
86
1
0
04 Apr 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution
  in Large Language Models
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
62
1
0
13 Mar 2024
The Heuristic Core: Understanding Subnetwork Generalization in
  Pretrained Language Models
The Heuristic Core: Understanding Subnetwork Generalization in Pretrained Language Models
Adithya Bhaskar
Dan Friedman
Danqi Chen
114
7
0
06 Mar 2024
Complexity Matters: Dynamics of Feature Learning in the Presence of
  Spurious Correlations
Complexity Matters: Dynamics of Feature Learning in the Presence of Spurious Correlations
GuanWen Qiu
Da Kuang
Surbhi Goel
110
8
0
05 Mar 2024
The Evolution of Statistical Induction Heads: In-Context Learning Markov
  Chains
The Evolution of Statistical Induction Heads: In-Context Learning Markov Chains
Benjamin L. Edelman
Ezra Edelman
Surbhi Goel
Eran Malach
Nikolaos Tsilivis
BDL
91
56
0
16 Feb 2024
Towards Uncovering How Large Language Model Works: An Explainability
  Perspective
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Jundong Li
91
13
0
16 Feb 2024
Measuring Sharpness in Grokking
Measuring Sharpness in Grokking
Jack Miller
Patrick Gleeson
Charles OÑeill
Thang Bui
Noam Levi
71
1
0
14 Feb 2024
Grokking Group Multiplication with Cosets
Grokking Group Multiplication with Cosets
Dashiell Stander
Qinan Yu
Honglu Fan
Stella Biderman
93
11
0
11 Dec 2023
Interpretability Illusions in the Generalization of Simplified Models
Interpretability Illusions in the Generalization of Simplified Models
Dan Friedman
Andrew Kyle Lampinen
Lucas Dixon
Danqi Chen
Asma Ghandeharioun
113
15
0
06 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
92
38
0
30 Nov 2023
Understanding Grokking Through A Robustness Viewpoint
Understanding Grokking Through A Robustness Viewpoint
Zhiquan Tan
Weiran Huang
AAMLOOD
64
7
0
11 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network
  Optimization
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
90
12
0
07 Nov 2023
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
Bridging Lottery Ticket and Grokking: Understanding Grokking from Inner Structure of Networks
Gouki Minegishi
Yusuke Iwasawa
Yutaka Matsuo
65
3
0
30 Oct 2023
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
92
4
0
26 Oct 2023
Grokking Beyond Neural Networks: An Empirical Exploration with Model
  Complexity
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity
Jack Miller
Charles OÑeill
Thang Bui
77
10
0
26 Oct 2023
Grokking in Linear Estimators -- A Solvable Model that Groks without
  Understanding
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Noam Levi
Alon Beck
Yohai Bar-Sinai
66
16
0
25 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on
  corrupted algorithmic datasets
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Darshil Doshi
Aritra Das
Tianyu He
Andrey Gromov
OOD
112
7
0
19 Oct 2023
Grokking as Compression: A Nonlinear Complexity Perspective
Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu
Ziqian Zhong
Max Tegmark
65
10
0
09 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
92
28
0
04 Oct 2023
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and
  Simplicity Bias in MLMs
Sudden Drops in the Loss: Syntax Acquisition, Phase Transitions, and Simplicity Bias in MLMs
Angelica Chen
Ravid Schwartz-Ziv
Kyunghyun Cho
Matthew L. Leavitt
Naomi Saphra
145
74
0
13 Sep 2023
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and
  Luck
Pareto Frontiers in Neural Feature Learning: Data, Compute, Width, and Luck
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
91
8
0
07 Sep 2023
Explaining grokking through circuit efficiency
Explaining grokking through circuit efficiency
Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
97
55
0
05 Sep 2023
Latent State Models of Training Dynamics
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Kyunghyun Cho
115
8
0
18 Aug 2023
The semantic landscape paradigm for neural networks
The semantic landscape paradigm for neural networks
Shreyas Gokhale
94
2
0
18 Jul 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLMLRM
201
388
0
29 May 2023
Break It Down: Evidence for Structural Compositionality in Neural
  Networks
Break It Down: Evidence for Structural Compositionality in Neural Networks
Michael A. Lepori
Thomas Serre
Ellie Pavlick
99
37
0
26 Jan 2023
1