Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2005.13630
Cited By
Explaining Neural Networks by Decoding Layer Activations
27 May 2020
Johannes Schneider
Michalis Vlachos
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Explaining Neural Networks by Decoding Layer Activations"
4 / 4 papers shown
Title
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
46
112
0
28 Mar 2024
Does Localization Inform Editing? Surprising Differences in Causality-Based Localization vs. Knowledge Editing in Language Models
Peter Hase
Joey Tianyi Zhou
Been Kim
Asma Ghandeharioun
MILM
36
167
0
10 Jan 2023
Concept-based Adversarial Attacks: Tricking Humans and Classifiers Alike
Johannes Schneider
Giovanni Apruzzese
AAML
24
8
0
18 Mar 2022
Deceptive AI Explanations: Creation and Detection
Johannes Schneider
Christian Meske
Michalis Vlachos
14
28
0
21 Jan 2020
1