ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.13253
  4. Cited By
Predicting Grokking Long Before it Happens: A look into the loss
  landscape of models which grok

Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

23 June 2023
Pascal Junior Tikeng Notsawo
Hattie Zhou
Mohammad Pezeshki
Irina Rish
G. Dumas
ArXivPDFHTML

Papers citing "Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok"

21 / 21 papers shown
Title
Grokking at the Edge of Linear Separability
Grokking at the Edge of Linear Separability
Alon Beck
Noam Levi
Yohai Bar-Sinai
34
1
0
06 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
52
3
0
21 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent
  Phase Transition
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
50
3
0
16 Aug 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition
Mohamad Amin Mohamadi
Zhiyuan Li
Lei Wu
Danica J. Sutherland
48
9
0
17 Jul 2024
Grokking Modular Polynomials
Grokking Modular Polynomials
Darshil Doshi
Tianyu He
Aritra Das
Andrey Gromov
40
4
0
05 Jun 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to
  the Edge of Generalization
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization
Boshi Wang
Xiang Yue
Yu-Chuan Su
Huan Sun
LRM
29
41
0
23 May 2024
Unified View of Grokking, Double Descent and Emergent Abilities: A
  Perspective from Circuits Competition
Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition
Yufei Huang
Shengding Hu
Xu Han
Zhiyuan Liu
Maosong Sun
64
14
0
23 Feb 2024
Critical Data Size of Language Models from a Grokking Perspective
Critical Data Size of Language Models from a Grokking Perspective
Xuekai Zhu
Yao Fu
Bowen Zhou
Zhouhan Lin
22
14
0
19 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce
  Grokking
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking
Kaifeng Lyu
Jikai Jin
Zhiyuan Li
Simon S. Du
Jason D. Lee
Wei Hu
AI4CE
41
32
0
30 Nov 2023
Understanding Grokking Through A Robustness Viewpoint
Understanding Grokking Through A Robustness Viewpoint
Zhiquan Tan
Weiran Huang
AAML
OOD
35
6
0
11 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network
  Optimization
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization
Elan Rosenfeld
Andrej Risteski
25
10
0
07 Nov 2023
Grokking in Linear Estimators -- A Solvable Model that Groks without
  Understanding
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding
Noam Levi
Alon Beck
Yohai Bar-Sinai
32
16
0
25 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on
  corrupted algorithmic datasets
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets
Darshil Doshi
Aritra Das
Tianyu He
Andrey Gromov
OOD
34
6
0
19 Oct 2023
Grokking as Compression: A Nonlinear Complexity Perspective
Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu
Ziqian Zhong
Max Tegmark
32
9
0
09 Oct 2023
Explaining grokking through circuit efficiency
Explaining grokking through circuit efficiency
Vikrant Varma
Rohin Shah
Zachary Kenton
János Kramár
Ramana Kumar
18
48
0
05 Sep 2023
Identifying Equivalent Training Dynamics
Identifying Equivalent Training Dynamics
William T. Redman
J. M. Bello-Rivas
M. Fonoberova
Ryan Mohr
Ioannis G. Kevrekidis
Igor Mezić
27
2
0
17 Feb 2023
Grokking phase transitions in learning local rules with gradient descent
Grokking phase transitions in learning local rules with gradient descent
Bojan Žunkovič
E. Ilievski
63
16
0
26 Oct 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Mohammad Pezeshki
Amartya Mitra
Yoshua Bengio
Guillaume Lajoie
61
25
0
06 Dec 2021
The Intrinsic Dimension of Images and Its Impact on Learning
The Intrinsic Dimension of Images and Its Impact on Learning
Phillip E. Pope
Chen Zhu
Ahmed Abdelkader
Micah Goldblum
Tom Goldstein
197
260
0
18 Apr 2021
The large learning rate phase of deep learning: the catapult mechanism
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
159
234
0
04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
284
2,890
0
15 Sep 2016
1