From Neurons to Neutrons: A Case Study in Interpretability

27 May 2024

Papers citing "From Neurons to Neutrons: A Case Study in Interpretability"

10 / 10 papers shown

Title
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models Daking Rai Yilun Zhou Shi Feng Abulhair Saparov Ziyu Yao 160 33 0 02 Jul 2024
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks Ziqian Zhong Ziming Liu Max Tegmark Jacob Andreas 75 102 0 30 Jun 2023
A Survey on Neural Network Interpretability Yu Zhang Peter Tiño A. Leonardis K. Tang FaML XAI 204 682 0 28 Dec 2020
Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning Armen Aghajanyan Luke Zettlemoyer Sonal Gupta 101 570 1 22 Dec 2020
Towards a Definition of Disentangled Representations I. Higgins David Amos David Pfau S. Racanière Loic Matthey Danilo Jimenez Rezende Alexander Lerchner OCL DRL 108 480 0 05 Dec 2018
Disentangling by Factorising Hyunjik Kim A. Mnih CoGe OOD 64 1,356 0 16 Feb 2018
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan Andrea Vedaldi Andrew Zisserman FAtt 314 7,316 0 20 Dec 2013
Visualizing and Understanding Convolutional Networks Matthew D. Zeiler Rob Fergus FAtt SSL 595 15,902 0 12 Nov 2013
Efficient Estimation of Word Representations in Vector Space Tomas Mikolov Kai Chen G. Corrado J. Dean 3DV 680 31,544 0 16 Jan 2013
Representation Learning: A Review and New Perspectives Yoshua Bengio Aaron Courville Pascal Vincent OOD SSL 274 12,458 0 24 Jun 2012