ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.05217
  4. Cited By
Progress measures for grokking via mechanistic interpretability
v1v2v3 (latest)

Progress measures for grokking via mechanistic interpretability

12 January 2023
Neel Nanda
Lawrence Chan
Tom Lieberum
Jess Smith
Jacob Steinhardt
ArXiv (abs)PDFHTML

Papers citing "Progress measures for grokking via mechanistic interpretability"

25 / 125 papers shown
Title
Labeling Neural Representations with Inverse Recognition
Labeling Neural Representations with Inverse Recognition
Kirill Bykov
Laura Kopf
Shinichi Nakajima
Marius Kloft
Marina M.-C. Höhne
BDL
122
20
0
22 Nov 2023
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data
  Fitted Networks
Scaling TabPFN: Sketching and Feature Selection for Tabular Prior-Data Fitted Networks
Ben Feuer
Chinmay Hegde
Niv Cohen
108
11
0
17 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
161
7
0
07 Nov 2023
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open
  Challenges and Interdisciplinary Research Directions
Explainable Artificial Intelligence (XAI) 2.0: A Manifesto of Open Challenges and Interdisciplinary Research Directions
Luca Longo
Mario Brcic
Federico Cabitza
Jaesik Choi
Roberto Confalonieri
...
Andrés Páez
Wojciech Samek
Johannes Schneider
Timo Speith
Simone Stumpf
150
226
0
30 Oct 2023
In-Context Learning Dynamics with Random Binary Sequences
In-Context Learning Dynamics with Random Binary Sequences
Eric J. Bigelow
Ekdeep Singh Lubana
Robert P. Dick
Hidenori Tanaka
T. Ullman
92
4
0
26 Oct 2023
Towards a Mechanistic Interpretation of Multi-Step Reasoning
  Capabilities of Language Models
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models
Buse Giledereli
Jiaoda Li
Yu Fei
Alessandro Stolfo
Wangchunshu Zhou
Guangtao Zeng
Antoine Bosselut
Mrinmaya Sachan
LRM
124
47
0
23 Oct 2023
When can transformers reason with abstract symbols?
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRMNAI
66
17
0
15 Oct 2023
Deep Neural Networks Can Learn Generalizable Same-Different Visual
  Relations
Deep Neural Networks Can Learn Generalizable Same-Different Visual Relations
Alexa R. Tartaglini
Sheridan Feucht
Michael A. Lepori
Wai Keen Vong
Charles Lovering
Brenden M. Lake
Ellie Pavlick
ViTOOD
57
4
0
14 Oct 2023
Measuring Feature Sparsity in Language Models
Measuring Feature Sparsity in Language Models
Mingyang Deng
Lucas Tao
Joe Benton
61
1
0
11 Oct 2023
Phase codes emerge in recurrent neural networks optimized for modular arithmetic
Phase codes emerge in recurrent neural networks optimized for modular arithmetic
Keith T. Murray
25
1
0
11 Oct 2023
Interpreting CLIP's Image Representation via Text-Based Decomposition
Interpreting CLIP's Image Representation via Text-Based Decomposition
Yossi Gandelsman
Alexei A. Efros
Jacob Steinhardt
VLM
84
101
0
09 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
205
115
0
27 Sep 2023
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Circuit Breaking: Removing Model Behaviors with Targeted Ablation
Maximilian Li
Xander Davies
Max Nadeau
KELMMU
75
29
0
12 Sep 2023
FIND: A Function Description Benchmark for Evaluating Interpretability
  Methods
FIND: A Function Description Benchmark for Evaluating Interpretability Methods
Sarah Schwettmann
Tamar Rott Shaham
Joanna Materzyñska
Neil Chowdhury
Shuang Li
Jacob Andreas
David Bau
Antonio Torralba
56
22
0
07 Sep 2023
NeuroSurgeon: A Toolkit for Subnetwork Analysis
NeuroSurgeon: A Toolkit for Subnetwork Analysis
Michael A. Lepori
Ellie Pavlick
Thomas Serre
83
7
0
01 Sep 2023
Large Language Models
Large Language Models
Michael R Douglas
LLMAGLM&MA
177
645
0
11 Jul 2023
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
Towards Regulatable AI Systems: Technical Gaps and Policy Opportunities
Xudong Shen
H. Brown
Jiashu Tao
Martin Strobel
Yao Tong
Akshay Narayan
Harold Soh
Finale Doshi-Velez
98
3
0
22 Jun 2023
Schema-learning and rebinding as mechanisms of in-context learning and
  emergence
Schema-learning and rebinding as mechanisms of in-context learning and emergence
Siva K. Swaminathan
Antoine Dedieu
Rajkumar Vasudeva Raju
Murray Shanahan
Miguel Lazaro-Gredilla
Dileep George
97
14
0
16 Jun 2023
Adversarial Attacks on the Interpretation of Neuron Activation
  Maximization
Adversarial Attacks on the Interpretation of Neuron Activation Maximization
Géraldin Nanfack
A. Fulleringer
Jonathan Marty
Michael Eickenberg
Eugene Belilovsky
AAMLFAtt
69
11
0
12 Jun 2023
Birth of a Transformer: A Memory Viewpoint
Birth of a Transformer: A Memory Viewpoint
A. Bietti
Vivien A. Cabannes
Diane Bouchacourt
Hervé Jégou
Léon Bottou
112
96
0
01 Jun 2023
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Physics of Language Models: Part 1, Learning Hierarchical Language Structures
Zeyuan Allen-Zhu
Yuanzhi Li
112
21
0
23 May 2023
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic
  Interpretability
Seeing is Believing: Brain-Inspired Modular Training for Mechanistic Interpretability
Ziming Liu
Eric Gan
Max Tegmark
82
40
0
04 May 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
68
319
0
28 Apr 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability
Tracr: Compiled Transformers as a Laboratory for Interpretability
David Lindner
János Kramár
Sebastian Farquhar
Matthew Rahtz
Tom McGrath
Vladimir Mikulik
130
75
0
12 Jan 2023
Omnigrok: Grokking Beyond Algorithmic Data
Omnigrok: Grokking Beyond Algorithmic Data
Ziming Liu
Eric J. Michaud
Max Tegmark
115
85
0
03 Oct 2022
Previous
123