Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.12201
Cited By
Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT
19 February 2024
Zhengfu He
Xuyang Ge
Qiong Tang
Tianxiang Sun
Qinyuan Cheng
Xipeng Qiu
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Dictionary Learning Improves Patch-Free Circuit Discovery in Mechanistic Interpretability: A Case Study on Othello-GPT"
15 / 15 papers shown
Title
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Lefei Zhang
Lijie Hu
Di Wang
LRM
163
4
0
17 Feb 2025
Rethinking Evaluation of Sparse Autoencoders through the Representation of Polysemous Words
Gouki Minegishi
Hiroki Furuta
Yusuke Iwasawa
Y. Matsuo
97
2
0
09 Jan 2025
Residual Stream Analysis with Multi-Layer SAEs
Tim Lawson
Lucy Farnik
Conor Houghton
Laurence Aitchison
76
5
0
06 Sep 2024
Knowledge Circuits in Pretrained Transformers
Yunzhi Yao
Ningyu Zhang
Zekun Xi
Meng Wang
Ziwen Xu
Shumin Deng
Huajun Chen
KELM
124
23
0
28 May 2024
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
189
112
0
27 Sep 2023
White-Box Transformers via Sparse Rate Reduction
Yaodong Yu
Sam Buchanan
Druv Pai
Tianzhe Chu
Ziyang Wu
Shengbang Tong
B. Haeffele
Yi Ma
ViT
87
86
0
01 Jun 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
316
516
0
24 Sep 2022
Toy Models of Superposition
Nelson Elhage
Tristan Hume
Catherine Olsson
Nicholas Schiefer
T. Henighan
...
Sam McCandlish
Jared Kaplan
Dario Amodei
Martin Wattenberg
C. Olah
AAML
MILM
183
368
0
21 Sep 2022
LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
Tim Dettmers
M. Lewis
Younes Belkada
Luke Zettlemoyer
MQ
98
653
0
15 Aug 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
248
1,357
0
10 Feb 2022
Word Embedding Visualization Via Dictionary Learning
Juexiao Zhang
Yubei Chen
Brian Cheung
Bruno A. Olshausen
40
16
0
09 Oct 2019
Deep Learning using Rectified Linear Units (ReLU)
Abien Fred Agarap
64
3,229
0
22 Mar 2018
Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning
Stefan Elfwing
E. Uchibe
Kenji Doya
133
1,728
0
10 Feb 2017
Linear Algebraic Structure of Word Senses, with Applications to Polysemy
Sanjeev Arora
Yuanzhi Li
Yingyu Liang
Tengyu Ma
Andrej Risteski
80
283
0
14 Jan 2016
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
397
33,550
0
16 Oct 2013
1