ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.10343
  4. Cited By
Towards Understanding Grokking: An Effective Theory of Representation
  Learning

Towards Understanding Grokking: An Effective Theory of Representation Learning

20 May 2022
Ziming Liu
O. Kitouni
Niklas Nolte
Eric J. Michaud
Max Tegmark
Mike Williams
    AI4CE
ArXivPDFHTML

Papers citing "Towards Understanding Grokking: An Effective Theory of Representation Learning"

31 / 31 papers shown
Title
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Reinforcement Learning for Reasoning in Large Language Models with One Training Example
Yiping Wang
Qing Yang
Zhiyuan Zeng
Liliang Ren
L. Liu
...
Jianfeng Gao
Weizhu Chen
S. Wang
Simon S. Du
Yelong Shen
OffRL
ReLM
LRM
120
5
0
29 Apr 2025
Representation Learning on a Random Lattice
Representation Learning on a Random Lattice
Aryeh Brill
OOD
FAtt
AI4CE
73
0
0
28 Apr 2025
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
NeuralGrok: Accelerate Grokking by Neural Gradient Transformation
Xinyu Zhou
Simin Fan
Martin Jaggi
Jie Fu
30
0
0
24 Apr 2025
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Beyond the Next Token: Towards Prompt-Robust Zero-Shot Classification via Efficient Multi-Token Prediction
Junlang Qian
Zixiao Zhu
Hanzhang Zhou
Zijian Feng
Zepeng Zhai
K. Mao
AAML
VLM
40
0
0
04 Apr 2025
Early Stopping Against Label Noise Without Validation Data
Early Stopping Against Label Noise Without Validation Data
Suqin Yuan
Lei Feng
Tongliang Liu
NoLa
104
15
0
11 Feb 2025
Harmonic Loss Trains Interpretable AI Models
Harmonic Loss Trains Interpretable AI Models
David D. Baek
Ziming Liu
Riya Tyagi
Max Tegmark
97
2
0
03 Feb 2025
Physics of Skill Learning
Physics of Skill Learning
Ziming Liu
Yizhou Liu
Eric J. Michaud
Jeff Gore
Max Tegmark
46
1
0
21 Jan 2025
How to explain grokking
How to explain grokking
S. V. Kozyrev
AI4CE
36
0
0
03 Jan 2025
ICLR: In-Context Learning of Representations
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
120
3
0
29 Dec 2024
On Memorization of Large Language Models in Logical Reasoning
On Memorization of Large Language Models in Logical Reasoning
Chulin Xie
Yangsibo Huang
Chiyuan Zhang
Da Yu
Xinyun Chen
Bill Yuchen Lin
Bo Li
Badih Ghazi
Ravi Kumar
LRM
53
20
0
30 Oct 2024
Formation of Representations in Neural Networks
Formation of Representations in Neural Networks
Liu Ziyin
Isaac Chuang
Tomer Galanti
T. Poggio
37
4
0
03 Oct 2024
Zero-shot forecasting of chaotic systems
Zero-shot forecasting of chaotic systems
Yuanzhao Zhang
William Gilpin
AI4TS
37
4
0
24 Sep 2024
Survival of the Fittest Representation: A Case Study with Modular
  Addition
Survival of the Fittest Representation: A Case Study with Modular Addition
Xiaoman Delores Ding
Zifan Carl Guo
Eric J. Michaud
Ziming Liu
Max Tegmark
48
3
0
27 May 2024
The Garden of Forking Paths: Observing Dynamic Parameters Distribution
  in Large Language Models
The Garden of Forking Paths: Observing Dynamic Parameters Distribution in Large Language Models
Carlo Nicolini
Jacopo Staiano
Bruno Lepri
Raffaele Marino
MoE
34
1
0
13 Mar 2024
Tune without Validation: Searching for Learning Rate and Weight Decay on
  Training Sets
Tune without Validation: Searching for Learning Rate and Weight Decay on Training Sets
Lorenzo Brigato
S. Mougiakakou
42
0
0
08 Mar 2024
Opening the AI black box: program synthesis via mechanistic
  interpretability
Opening the AI black box: program synthesis via mechanistic interpretability
Eric J. Michaud
Isaac Liao
Vedang Lad
Ziming Liu
Anish Mudide
Chloe Loughridge
Zifan Carl Guo
Tara Rezaei Kheirkhah
Mateja Vukelić
Max Tegmark
23
12
0
07 Feb 2024
Grokking as Compression: A Nonlinear Complexity Perspective
Grokking as Compression: A Nonlinear Complexity Perspective
Ziming Liu
Ziqian Zhong
Max Tegmark
38
9
0
09 Oct 2023
Grokking as a First Order Phase Transition in Two Layer Networks
Grokking as a First Order Phase Transition in Two Layer Networks
Noa Rubin
Inbar Seroussi
Z. Ringel
37
15
0
05 Oct 2023
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Benign Overfitting and Grokking in ReLU Networks for XOR Cluster Data
Zhiwei Xu
Yutong Wang
Spencer Frei
Gal Vardi
Wei Hu
MLT
28
24
0
04 Oct 2023
It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD
  Generalization for Generative Transformer Models
It Ain't That Bad: Understanding the Mysterious Performance Drop in OOD Generalization for Generative Transformer Models
Xingcheng Xu
Zihao Pan
Haipeng Zhang
Yanqing Yang
LRM
18
2
0
16 Aug 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
30
329
0
29 May 2023
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic
  Interpretability
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Ziming Liu
Eric Gan
Max Tegmark
26
36
0
04 May 2023
Unifying Grokking and Double Descent
Unifying Grokking and Double Descent
Peter W. Battaglia
David Raposo
Kelsey
37
31
0
10 Mar 2023
A Toy Model of Universality: Reverse Engineering How Networks Learn
  Group Operations
A Toy Model of Universality: Reverse Engineering How Networks Learn Group Operations
Bilal Chughtai
Lawrence Chan
Neel Nanda
21
96
0
06 Feb 2023
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Logical Tasks for Measuring Extrapolation and Rule Comprehension
Ippei Fujisawa
Ryota Kanai
ELM
LRM
28
4
0
14 Nov 2022
Grokking phase transitions in learning local rules with gradient descent
Grokking phase transitions in learning local rules with gradient descent
Bojan Žunkovič
E. Ilievski
63
16
0
26 Oct 2022
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
56
76
0
03 Oct 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Multi-scale Feature Learning Dynamics: Insights for Double Descent
Mohammad Pezeshki
Amartya Mitra
Yoshua Bengio
Guillaume Lajoie
61
25
0
06 Dec 2021
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning
  Dynamics
Neural Mechanics: Symmetry and Broken Conservation Laws in Deep Learning Dynamics
D. Kunin
Javier Sagastuy-Breña
Surya Ganguli
Daniel L. K. Yamins
Hidenori Tanaka
107
77
0
08 Dec 2020
Contrastive Representation Learning: A Framework and Review
Contrastive Representation Learning: A Framework and Review
Phúc H. Lê Khắc
Graham Healy
Alan F. Smeaton
SSL
AI4TS
175
685
0
10 Oct 2020
1