Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok

23 June 2023

Pascal Junior Tikeng Notsawo

Papers citing "Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok"

21 / 21 papers shown

Title
Grokking at the Edge of Linear Separability Alon Beck Noam Levi Yohai Bar-Sinai 34 1 0 06 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights David Yunis Kumar Kshitij Patel Samuel Wheeler Pedro H. P. Savarese Gal Vardi Karen Livescu Michael Maire Matthew R. Walter 52 3 0 21 Aug 2024
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition Kenzo Clauw S. Stramaglia Daniele Marinazzo 50 3 0 16 Aug 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition Mohamad Amin Mohamadi Zhiyuan Li Lei Wu Danica J. Sutherland 48 9 0 17 Jul 2024
Grokking Modular Polynomials Darshil Doshi Tianyu He Aritra Das Andrey Gromov 40 4 0 05 Jun 2024
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Boshi Wang Xiang Yue Yu-Chuan Su Huan Sun LRM 29 41 0 23 May 2024
Unified View of Grokking, Double Descent and Emergent Abilities: A Perspective from Circuits Competition Yufei Huang Shengding Hu Xu Han Zhiyuan Liu Maosong Sun 64 14 0 23 Feb 2024
Critical Data Size of Language Models from a Grokking Perspective Xuekai Zhu Yao Fu Bowen Zhou Zhouhan Lin 22 14 0 19 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 41 32 0 30 Nov 2023
Understanding Grokking Through A Robustness Viewpoint Zhiquan Tan Weiran Huang AAML OOD 35 6 0 11 Nov 2023
Outliers with Opposing Signals Have an Outsized Effect on Neural Network Optimization Elan Rosenfeld Andrej Risteski 25 10 0 07 Nov 2023
Grokking in Linear Estimators -- A Solvable Model that Groks without Understanding Noam Levi Alon Beck Yohai Bar-Sinai 32 16 0 25 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets Darshil Doshi Aritra Das Tianyu He Andrey Gromov OOD 34 6 0 19 Oct 2023
Grokking as Compression: A Nonlinear Complexity Perspective Ziming Liu Ziqian Zhong Max Tegmark 32 9 0 09 Oct 2023
Explaining grokking through circuit efficiency Vikrant Varma Rohin Shah Zachary Kenton János Kramár Ramana Kumar 18 48 0 05 Sep 2023
Identifying Equivalent Training Dynamics William T. Redman J. M. Bello-Rivas M. Fonoberova Ryan Mohr Ioannis G. Kevrekidis Igor Mezić 27 2 0 17 Feb 2023
Grokking phase transitions in learning local rules with gradient descent Bojan Žunkovič E. Ilievski 63 16 0 26 Oct 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent Mohammad Pezeshki Amartya Mitra Yoshua Bengio Guillaume Lajoie 61 25 0 06 Dec 2021
The Intrinsic Dimension of Images and Its Impact on Learning Phillip E. Pope Chen Zhu Ahmed Abdelkader Micah Goldblum Tom Goldstein 197 260 0 18 Apr 2021
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 234 0 04 Mar 2020
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 284 2,890 0 15 Sep 2016