Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

16 August 2024

Papers citing "Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition"

11 / 11 papers shown

Title
TRACE for Tracking the Emergence of Semantic Representations in Transformers Nura Aljaafari Danilo S. Carvalho André Freitas 85 0 0 23 May 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction Junlang Qian Zixiao Zhu Hanzhang Zhou Zijian Feng Zepeng Zhai K. Mao AAML VLM 104 0 0 04 Apr 2025
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data Zhiwei Xu Yutong Wang Spencer Frei Gal Vardi Wei Hu MLT 70 28 0 04 Oct 2023
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok Pascal Junior Tikeng Notsawo Hattie Zhou Mohammad Pezeshki Irina Rish G. Dumas 67 24 0 23 Jun 2023
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks William Merrill Nikolaos Tsilivis Aman Shukla 59 54 0 21 Mar 2023
Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance Kenzo Clauw S. Stramaglia Daniele Marinazzo SSL FAtt 49 6 0 01 Nov 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit Boaz Barak Benjamin L. Edelman Surbhi Goel Sham Kakade Eran Malach Cyril Zhang 101 133 0 18 Jul 2022
Emergent Abilities of Large Language Models Jason W. Wei Yi Tay Rishi Bommasani Colin Raffel Barret Zoph ... Tatsunori Hashimoto Oriol Vinyals Percy Liang J. Dean W. Fedus ELM ReLM LRM 286 2,511 0 15 Jun 2022
Visualizing the Loss Landscape of Neural Nets Hao Li Zheng Xu Gavin Taylor Christoph Studer Tom Goldstein 258 1,898 0 28 Dec 2017
Decoupled Weight Decay Regularization I. Loshchilov Frank Hutter OffRL 151 2,151 0 14 Nov 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima N. Keskar Dheevatsa Mudigere J. Nocedal M. Smelyanskiy P. T. P. Tang ODL 429 2,945 0 15 Sep 2016