Grokking phase transitions in learning local rules with gradient descent

26 October 2022

Papers citing "Grokking phase transitions in learning local rules with gradient descent"

19 / 19 papers shown

Title
Let Me Grok for You: Accelerating Grokking via Embedding Transfer from a Weaker Model Zhiwei Xu Zhiyu Ni Yixin Wang Wei Hu CLL 32 0 0 17 Apr 2025
Grokking Explained: A Statistical Phenomenon B. W. Carvalho Artur Garcez Luís C. Lamb Emílio Vital Brazil 64 0 0 03 Feb 2025
Grokking at the Edge of Numerical Stability Lucas Prieto Melih Barsbey Pedro A.M. Mediano Tolga Birdal 34 3 0 08 Jan 2025
Understanding the Generalization Benefits of Late Learning Rate Decay Yinuo Ren Chao Ma Lexing Ying AI4CE 24 6 0 21 Jan 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 36 32 0 30 Nov 2023
Understanding Grokking Through A Robustness Viewpoint Zhiquan Tan Weiran Huang AAML OOD 30 6 0 11 Nov 2023
Grokking Beyond Neural Networks: An Empirical Exploration with Model Complexity Jack Miller Charles OÑeill Thang Bui 24 9 0 26 Oct 2023
To grok or not to grok: Disentangling generalization and memorization on corrupted algorithmic datasets Darshil Doshi Aritra Das Tianyu He Andrey Gromov OOD 32 6 0 19 Oct 2023
Grokking as a First Order Phase Transition in Two Layer Networks Noa Rubin Inbar Seroussi Z. Ringel 31 15 0 05 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Zhiwei Xu Yutong Wang Spencer Frei Gal Vardi Wei Hu MLT 26 23 0 04 Oct 2023
The semantic landscape paradigm for neural networks Shreyas Gokhale 21 2 0 18 Jul 2023
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok Pascal Junior Tikeng Notsawo Hattie Zhou Mohammad Pezeshki Irina Rish G. Dumas 17 23 0 23 Jun 2023
Grokking modular arithmetic Andrey Gromov 35 37 0 06 Jan 2023
Positive unlabeled learning with tensor networks Bojan Žunkovič SSL 33 4 0 25 Nov 2022
Deep tensor networks with matrix product operators Bojan Žunkovič 62 4 0 16 Sep 2022
From Tensor Network Quantum States to Tensorial Recurrent Neural Networks Dian Wu R. Rossi F. Vicentini Giuseppe Carleo 96 25 0 24 Jun 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent Mohammad Pezeshki Amartya Mitra Yoshua Bengio Guillaume Lajoie 53 25 0 06 Dec 2021
Exploring Deep Neural Networks via Layer-Peeled Model: Minority Collapse in Imbalanced Training Cong Fang Hangfeng He Qi Long Weijie J. Su FAtt 122 165 0 29 Jan 2021
Modeling Sequences with Quantum States: A Look Under the Hood T. Bradley Miles E. Stoudenmire John Terilla 68 48 0 16 Oct 2019