Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.08944
Cited By
Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition
16 August 2024
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition"
11 / 11 papers shown
Title
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
85
0
0
23 May 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAML
VLM
104
0
0
04 Apr 2025
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
70
28
0
04 Oct 2023
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Junior Tikeng Notsawo
Hattie Zhou
Mohammad Pezeshki
Irina Rish
G. Dumas
67
24
0
23 Jun 2023
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks
William Merrill
Nikolaos Tsilivis
Aman Shukla
59
54
0
21 Mar 2023
Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
SSL
FAtt
49
6
0
01 Nov 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
101
133
0
18 Jul 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
286
2,511
0
15 Jun 2022
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
258
1,898
0
28 Dec 2017
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
151
2,151
0
14 Nov 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
429
2,945
0
15 Sep 2016
1