ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.18817
  4. Cited By
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
v1v2 (latest)

Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking

30 November 2023
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
    AI4CE
ArXiv (abs)PDFHTML

Papers citing "Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking"

8 / 8 papers shown
Title
GrokAlign: Geometric Characterisation and Acceleration of Grokking
GrokAlign: Geometric Characterisation and Acceleration of Grokking
Thomas Walker
Ahmed Imtiaz Humayun
Randall Balestriero
Richard G. Baraniuk
37
0
0
14 Jun 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAMLVLM
125
0
0
04 Apr 2025
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Position: Solve Layerwise Linear Models First to Understand Neural Dynamical Phenomena (Neural Collapse, Emergence, Lazy/Rich Regime, and Grokking)
Yoonsoo Nam
Seok Hyeong Lee
Clementine Domine
Yea Chan Park
Charles London
Wonyl Choi
Niclas Goring
Seungjai Lee
AI4CE
221
1
0
28 Feb 2025
Grokking at the Edge of Numerical Stability
Grokking at the Edge of Numerical Stability
Lucas Prieto
Melih Barsbey
Pedro A.M. Mediano
Tolga Birdal
135
5
0
08 Jan 2025
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
160
10
0
31 Dec 2024
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Emergence in non-neural models: grokking modular arithmetic via average gradient outer product
Neil Rohit Mallinar
Daniel Beaglehole
Libin Zhu
Adityanarayanan Radhakrishnan
Parthe Pandit
Misha Belkin
97
8
0
29 Jul 2024
A rationale from frequency perspective for grokking in training neural
  network
A rationale from frequency perspective for grokking in training neural network
Zhangchen Zhou
Yaoyu Zhang
Z. Xu
88
2
0
24 May 2024
Towards Uncovering How Large Language Model Works: An Explainability
  Perspective
Towards Uncovering How Large Language Model Works: An Explainability Perspective
Haiyan Zhao
Fan Yang
Bo Shen
Himabindu Lakkaraju
Jundong Li
91
13
0
16 Feb 2024
1