ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.10827
  4. Cited By
LLM Circuit Analyses Are Consistent Across Training and Scale

LLM Circuit Analyses Are Consistent Across Training and Scale

15 July 2024
Curt Tigges
Michael Hanna
Qinan Yu
Stella Biderman
ArXivPDFHTML

Papers citing "LLM Circuit Analyses Are Consistent Across Training and Scale"

23 / 23 papers shown
Title
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Bigram Subnetworks: Mapping to Next Tokens in Transformer Language Models
Tyler A. Chang
Benjamin Bergen
50
0
0
21 Apr 2025
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
PolyPythias: Stability and Outliers across Fifty Language Model Pre-Training Runs
Oskar van der Wal
Pietro Lesci
Max Muller-Eberstein
Naomi Saphra
Hailey Schoelkopf
Willem H. Zuidema
Stella Biderman
LRM
61
0
0
12 Mar 2025
À la recherche du sens perdu: your favourite LLM might have more to say than you can understand
K. O. T. Erziev
34
0
0
28 Feb 2025
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Promote, Suppress, Iterate: How Language Models Answer One-to-Many Factual Queries
Tianyi Lorena Yan
Robin Jia
KELM
MU
46
0
0
27 Feb 2025
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own
The Lazy Student's Dream: ChatGPT Passing an Engineering Course on Its Own
Gokul Puthumanaillam
Timothy Bretl
Melkior Ornik
39
0
0
23 Feb 2025
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
How Do LLMs Acquire New Knowledge? A Knowledge Circuits Perspective on Continual Pre-Training
Yixin Ou
Yunzhi Yao
N. Zhang
Hui Jin
Jiacheng Sun
Shumin Deng
ZeLin Li
H. Chen
KELM
CLL
54
0
0
16 Feb 2025
Mechanistic?
Mechanistic?
Naomi Saphra
Sarah Wiegreffe
AI4CE
29
9
0
07 Oct 2024
Differentiation and Specialization of Attention Heads via the Refined
  Local Learning Coefficient
Differentiation and Specialization of Attention Heads via the Refined Local Learning Coefficient
George Wang
Jesse Hoogland
Stan van Wingerden
Zach Furman
Daniel Murfet
OffRL
28
7
0
03 Oct 2024
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Geometric Signatures of Compositionality Across a Language Model's Lifetime
Jin Hwa Lee
Thomas Jiralerspong
Lei Yu
Yoshua Bengio
Emily Cheng
CoGe
84
0
0
02 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
82
19
0
02 Jul 2024
What needs to go right for an induction head? A mechanistic study of
  in-context learning circuits and their formation
What needs to go right for an induction head? A mechanistic study of in-context learning circuits and their formation
Aaditya K. Singh
Ted Moskovitz
Felix Hill
Stephanie C. Y. Chan
Andrew M. Saxe
AI4CE
47
25
0
10 Apr 2024
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding
  Model Mechanisms
Have Faith in Faithfulness: Going Beyond Circuit Overlap When Finding Model Mechanisms
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
51
34
0
26 Mar 2024
AtP*: An efficient and scalable method for localizing LLM behaviour to
  components
AtP*: An efficient and scalable method for localizing LLM behaviour to components
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
KELM
45
42
0
01 Mar 2024
Information Flow Routes: Automatically Interpreting Language Models at
  Scale
Information Flow Routes: Automatically Interpreting Language Models at Scale
Javier Ferrando
Elena Voita
51
34
0
27 Feb 2024
OLMo: Accelerating the Science of Language Models
OLMo: Accelerating the Science of Language Models
Dirk Groeneveld
Iz Beltagy
Pete Walsh
Akshita Bhagia
Rodney Michael Kinney
...
Jesse Dodge
Kyle Lo
Luca Soldaini
Noah A. Smith
Hanna Hajishirzi
OSLM
135
358
0
01 Feb 2024
Attribution Patching Outperforms Automated Circuit Discovery
Attribution Patching Outperforms Automated Circuit Discovery
Aaquib Syed
Can Rager
Arthur Conmy
68
56
0
16 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Finding Neurons in a Haystack: Case Studies with Sparse Probing
Wes Gurnee
Neel Nanda
Matthew Pauly
Katherine Harvey
Dmitrii Troitskii
Dimitris Bertsimas
MILM
160
186
0
02 May 2023
How does GPT-2 compute greater-than?: Interpreting mathematical
  abilities in a pre-trained language model
How does GPT-2 compute greater-than?: Interpreting mathematical abilities in a pre-trained language model
Michael Hanna
Ollie Liu
Alexandre Variengien
LRM
189
120
0
30 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
496
0
01 Nov 2022
In-context Learning and Induction Heads
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
250
460
0
24 Sep 2022
Word Acquisition in Neural Language Models
Word Acquisition in Neural Language Models
Tyler A. Chang
Benjamin Bergen
29
39
0
05 Oct 2021
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
226
405
0
24 Feb 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
240
4,469
0
23 Jan 2020
1