The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and
the Grokking Phenomenon

The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon

10 June 2022

Papers citing "The Slingshot Mechanism: An Empirical Study of Adaptive Optimizers and the Grokking Phenomenon"

12 / 12 papers shown

Title
Grokking in the Wild: Data Augmentation for Real-World Multi-Hop Reasoning with Transformers Roman Abramov Felix Steinbauer Gjergji Kasneci 201 0 0 29 Apr 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation Xinyu Zhou Simin Fan Martin Jaggi Jie Fu 41 0 0 24 Apr 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction Junlang Qian Zixiao Zhu Hanzhang Zhou Zijian Feng Zepeng Zhai K. Mao AAML VLM 43 0 0 04 Apr 2025
Grokking at the Edge of Numerical Stability Lucas Prieto Melih Barsbey Pedro A.M. Mediano Tolga Birdal 51 3 0 08 Jan 2025
Survival of the Fittest Representation: A Case Study with Modular Addition Xiaoman Delores Ding Zifan Carl Guo Eric J. Michaud Ziming Liu Max Tegmark 48 3 0 27 May 2024
Grokking as Compression: A Nonlinear Complexity Perspective Ziming Liu Ziqian Zhong Max Tegmark 38 9 0 09 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Zhiwei Xu Yutong Wang Spencer Frei Gal Vardi Wei Hu MLT 28 24 0 04 Oct 2023
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 40 86 0 25 Sep 2023
Progress measures for grokking via mechanistic interpretability Neel Nanda Lawrence Chan Tom Lieberum Jess Smith Jacob Steinhardt 49 386 0 12 Jan 2023
Grokking phase transitions in learning local rules with gradient descent Bojan Žunkovič E. Ilievski 63 17 0 26 Oct 2022
Understanding Gradient Descent on Edge of Stability in Deep Learning Sanjeev Arora Zhiyuan Li A. Panigrahi MLT 83 91 0 19 May 2022
The large learning rate phase of deep learning: the catapult mechanism Aitor Lewkowycz Yasaman Bahri Ethan Dyer Jascha Narain Sohl-Dickstein Guy Gur-Ari ODL 159 235 0 04 Mar 2020