ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.08944
  4. Cited By
Information-Theoretic Progress Measures reveal Grokking is an Emergent
  Phase Transition

Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition

16 August 2024
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
ArXiv (abs)PDFHTML

Papers citing "Information-Theoretic Progress Measures reveal Grokking is an Emergent Phase Transition"

11 / 11 papers shown
Title
TRACE for Tracking the Emergence of Semantic Representations in Transformers
TRACE for Tracking the Emergence of Semantic Representations in Transformers
Nura Aljaafari
Danilo S. Carvalho
André Freitas
85
0
0
23 May 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAMLVLM
104
0
0
04 Apr 2025
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
70
28
0
04 Oct 2023
Predicting Grokking Long Before it Happens: A look into the loss
  landscape of models which grok
Predicting Grokking Long Before it Happens: A look into the loss landscape of models which grok
Pascal Junior Tikeng Notsawo
Hattie Zhou
Mohammad Pezeshki
Irina Rish
G. Dumas
67
24
0
23 Jun 2023
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense
  Subnetworks
A Tale of Two Circuits: Grokking as Competition of Sparse and Dense Subnetworks
William Merrill
Nikolaos Tsilivis
Aman Shukla
59
54
0
21 Mar 2023
Higher-order mutual information reveals synergistic sub-networks for
  multi-neuron importance
Higher-order mutual information reveals synergistic sub-networks for multi-neuron importance
Kenzo Clauw
S. Stramaglia
Daniele Marinazzo
SSLFAtt
49
6
0
01 Nov 2022
Hidden Progress in Deep Learning: SGD Learns Parities Near the
  Computational Limit
Hidden Progress in Deep Learning: SGD Learns Parities Near the Computational Limit
Boaz Barak
Benjamin L. Edelman
Surbhi Goel
Sham Kakade
Eran Malach
Cyril Zhang
101
133
0
18 Jul 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELMReLMLRM
286
2,511
0
15 Jun 2022
Visualizing the Loss Landscape of Neural Nets
Visualizing the Loss Landscape of Neural Nets
Hao Li
Zheng Xu
Gavin Taylor
Christoph Studer
Tom Goldstein
258
1,898
0
28 Dec 2017
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
151
2,151
0
14 Nov 2017
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp
  Minima
On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima
N. Keskar
Dheevatsa Mudigere
J. Nocedal
M. Smelyanskiy
P. T. P. Tang
ODL
429
2,945
0
15 Sep 2016
1