Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.18817
Cited By
v1
v2 (latest)
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
30 November 2023
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking"
8 / 8 papers shown
Title
GrokAlign: Geometric Characterisation and Acceleration of Grokking
Thomas Walker
Ahmed Imtiaz Humayun
Randall Balestriero
Richard G. Baraniuk
37
0
0
14 Jun 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAML
VLM
125
0
0
04 Apr 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
221
1
0
28 Feb 2025
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
135
5
0
08 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
160
10
0
31 Dec 2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Rohit Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Misha Belkin
97
8
0
29 Jul 2024
A rationale from frequency perspective for grokking in training neural network
Zhangchen Zhou
Yaoyu Zhang
Z. Xu
88
2
0
24 May 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Jundong Li
91
13
0
16 Feb 2024
1