Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.07809
Cited By
pyvene: A Library for Understanding and Improving PyTorch Models via Interventions
12 March 2024
Zhengxuan Wu
Atticus Geiger
Aryaman Arora
Jing-ling Huang
Zheng Wang
Noah D. Goodman
Christopher D. Manning
Christopher Potts
MU
Re-assign community
ArXiv
PDF
HTML
Papers citing
"pyvene: A Library for Understanding and Improving PyTorch Models via Interventions"
10 / 10 papers shown
Title
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
43
1
0
17 Apr 2025
Enhancing Hallucination Detection through Noise Injection
Litian Liu
Reza Pourreza
Sunny Panchal
Apratim Bhattacharyya
Yao Qin
Roland Memisevic
HILM
75
2
0
06 Feb 2025
Controllable Context Sensitivity and the Knob Behind It
Julian Minder
Kevin Du
Niklas Stoehr
Giovanni Monea
Chris Wendler
Robert West
Ryan Cotterell
KELM
49
3
0
11 Nov 2024
Personality Alignment of Large Language Models
Minjun Zhu
Linyi Yang
Yue Zhang
Yue Zhang
ALM
64
5
0
21 Aug 2024
Penzai + Treescope: A Toolkit for Interpreting, Visualizing, and Editing Models As Data
Mingshu Li
36
3
0
01 Aug 2024
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
43
6
0
27 Jun 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
75
7
0
07 Nov 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models
Mor Geva
Jasmijn Bastings
Katja Filippova
Amir Globerson
KELM
191
261
0
28 Apr 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
75
98
0
05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
494
0
01 Nov 2022
1