AtP*: An efficient and scalable method for localizing LLM behaviour to
components

AtP*: An efficient and scalable method for localizing LLM behaviour to components

1 March 2024

ArXiv (abs)PDF HTML

Papers citing "AtP*: An efficient and scalable method for localizing LLM behaviour to components"

12 / 12 papers shown

Title
Identifying Sub-networks in Neural Networks via Functionally Similar Representations Tian Gao Amit Dhurandhar Karthikeyan N. Ramamurthy Dennis L. Wei 78 0 0 21 Oct 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 158 33 0 02 Jul 2024
Finding Transformer Circuits with Edge Pruning Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen 185 20 0 24 Jun 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models Samuel Marks Can Rager Eric J. Michaud Yonatan Belinkov David Bau Aaron Mueller 133 158 0 28 Mar 2024
How do Language Models Bind Entities in Context? Jiahai Feng Jacob Steinhardt 108 39 0 26 Oct 2023
Circuit Component Reuse Across Tasks in Transformer Language Models Jack Merullo Carsten Eickhoff Ellie Pavlick 76 71 0 12 Oct 2023
Rigorously Assessing Natural Language Explanations of Neurons Jing-ling Huang Atticus Geiger Karel DÓosterlinck Zhengxuan Wu Christopher Potts MILM 73 28 0 19 Sep 2023
LEACE: Perfect linear concept erasure in closed form Nora Belrose David Schneider-Joseph Shauli Ravfogel Ryan Cotterell Edward Raff Stella Biderman KELM MU 86 119 0 06 Jun 2023
Sparse Interventions in Language Models with Differentiable Masking Nicola De Cao Leon Schmid Dieuwke Hupkes Ivan Titov 63 28 0 13 Dec 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 452 2,113 0 31 Dec 2020
Discovering the Compositional Structure of Vector Representations with Role Learning Networks Paul Soulos R. Thomas McCoy Tal Linzen P. Smolensky CoGe 82 44 0 21 Oct 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 103 1,062 0 25 May 2019