ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.00745
  4. Cited By
AtP*: An efficient and scalable method for localizing LLM behaviour to
  components

AtP*: An efficient and scalable method for localizing LLM behaviour to components

1 March 2024
János Kramár
Tom Lieberum
Rohin Shah
Neel Nanda
    KELM
ArXiv (abs)PDFHTML

Papers citing "AtP*: An efficient and scalable method for localizing LLM behaviour to components"

12 / 12 papers shown
Title
Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Identifying Sub-networks in Neural Networks via Functionally Similar Representations
Tian Gao
Amit Dhurandhar
Karthikeyan N. Ramamurthy
Dennis L. Wei
78
0
0
21 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
158
33
0
02 Jul 2024
Finding Transformer Circuits with Edge Pruning
Finding Transformer Circuits with Edge Pruning
Adithya Bhaskar
Alexander Wettig
Dan Friedman
Danqi Chen
185
20
0
24 Jun 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
133
158
0
28 Mar 2024
How do Language Models Bind Entities in Context?
How do Language Models Bind Entities in Context?
Jiahai Feng
Jacob Steinhardt
108
39
0
26 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models
Circuit Component Reuse Across Tasks in Transformer Language Models
Jack Merullo
Carsten Eickhoff
Ellie Pavlick
76
71
0
12 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons
Rigorously Assessing Natural Language Explanations of Neurons
Jing-ling Huang
Atticus Geiger
Karel DÓosterlinck
Zhengxuan Wu
Christopher Potts
MILM
73
28
0
19 Sep 2023
LEACE: Perfect linear concept erasure in closed form
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELMMU
86
119
0
06 Jun 2023
Sparse Interventions in Language Models with Differentiable Masking
Sparse Interventions in Language Models with Differentiable Masking
Nicola De Cao
Leon Schmid
Dieuwke Hupkes
Ivan Titov
63
28
0
13 Dec 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
452
2,113
0
31 Dec 2020
Discovering the Compositional Structure of Vector Representations with
  Role Learning Networks
Discovering the Compositional Structure of Vector Representations with Role Learning Networks
Paul Soulos
R. Thomas McCoy
Tal Linzen
P. Smolensky
CoGe
82
44
0
21 Oct 2019
Are Sixteen Heads Really Better than One?
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
103
1,062
0
25 May 2019
1