DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models

3 October 2023

Papers citing "DeepDecipher: Accessing and Investigating Neuron Activation in Large Language Models"

8 / 8 papers shown

Title
Quantifying Feature Space Universality Across Large Language Models via Sparse Autoencoders Michael Lan Philip Torr Austin Meek Ashkan Khakzar David M. Krueger Fazl Barez 43 11 0 09 Oct 2024
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 162 190 0 02 May 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 497 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 463 0 24 Sep 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 131 322 0 21 Sep 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 282 2,000 0 31 Dec 2020
Methods for Interpreting and Understanding Deep Neural Networks G. Montavon Wojciech Samek K. Müller FaML 234 2,238 0 24 Jun 2017
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 296 31,267 0 16 Jan 2013