ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.02997
  4. Cited By
Causal Abstractions of Neural Networks

Causal Abstractions of Neural Networks

6 June 2021
Atticus Geiger
Hanson Lu
Thomas Icard
Christopher Potts
    NAI
    CML
ArXivPDFHTML

Papers citing "Causal Abstractions of Neural Networks"

50 / 56 papers shown
Title
Tracr-Injection: Distilling Algorithms into Pre-trained Language Models
Tracr-Injection: Distilling Algorithms into Pre-trained Language Models
Tomás Vergara-Browne
Álvaro Soto
17
0
0
15 May 2025
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Short-circuiting Shortcuts: Mechanistic Investigation of Shortcuts in Text Classification
Leon Eshuijs
Shihan Wang
Antske Fokkens
33
0
0
09 May 2025
Understanding In-context Learning of Addition via Activation Subspaces
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
Kayo Yin
Michael I. Jordan
Jacob Steinhardt
Lijie Chen
58
0
0
08 May 2025
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Evaluating Explanations: An Explanatory Virtues Framework for Mechanistic Interpretability -- The Strange Science Part I.ii
Kola Ayonrinde
Louis Jaburi
XAI
88
1
0
02 May 2025
MIB: A Mechanistic Interpretability Benchmark
MIB: A Mechanistic Interpretability Benchmark
Aaron Mueller
Atticus Geiger
Sarah Wiegreffe
Dana Arad
Iván Arcuschin
...
Alessandro Stolfo
Martin Tutek
Amir Zur
David Bau
Yonatan Belinkov
53
1
0
17 Apr 2025
Steering off Course: Reliability Challenges in Steering Language Models
Steering off Course: Reliability Challenges in Steering Language Models
Patrick Queiroz Da Silva
Hari Sethuraman
Dheeraj Rajagopal
Hannaneh Hajishirzi
Sachin Kumar
LLMSV
39
1
0
06 Apr 2025
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Contextualize-then-Aggregate: Circuits for In-Context Learning in Gemma-2 2B
Aleksandra Bakalova
Yana Veitsman
Xinting Huang
Michael Hahn
38
0
0
31 Mar 2025
Are formal and functional linguistic mechanisms dissociated in language models?
Are formal and functional linguistic mechanisms dissociated in language models?
Michael Hanna
Sandro Pezzelle
Yonatan Belinkov
54
0
0
14 Mar 2025
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ensemble Debiasing Across Class and Sample Levels for Fairer Prompting Accuracy
Ruixi Lin
Ziqiao Wang
Yang You
FaML
89
1
0
07 Mar 2025
Re-Imagining Multimodal Instruction Tuning: A Representation View
Re-Imagining Multimodal Instruction Tuning: A Representation View
Yiyang Liu
James Liang
Ruixiang Tang
Yugyung Lee
Majid Rabbani
...
Raghuveer M. Rao
Lifu Huang
Dongfang Liu
Qifan Wang
Cheng Han
213
0
0
02 Mar 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Lefei Zhang
Lijie Hu
Di Wang
LRM
100
1
0
17 Feb 2025
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Building Bridges, Not Walls -- Advancing Interpretability by Unifying Feature, Data, and Model Component Attribution
Shichang Zhang
Tessa Han
Usha Bhalla
Hima Lakkaraju
FAtt
160
0
0
17 Feb 2025
Sample-efficient Learning of Concepts with Theoretical Guarantees: from Data to Concepts without Interventions
H. Fokkema
T. Erven
Sara Magliacane
72
1
0
10 Feb 2025
It's Not Just a Phase: On Investigating Phase Transitions in Deep Learning-based Side-channel Analysis
It's Not Just a Phase: On Investigating Phase Transitions in Deep Learning-based Side-channel Analysis
Sengim Karayalçin
Marina Krček
Stjepan Picek
AAML
80
0
0
01 Feb 2025
What is causal about causal models and representations?
What is causal about causal models and representations?
Frederik Hytting Jørgensen
Luigi Gresele
S. Weichwald
CML
113
0
0
31 Jan 2025
Aligning Graphical and Functional Causal Abstractions
Aligning Graphical and Functional Causal Abstractions
Wilem Schooltink
Fabio Massimo Zennaro
73
1
0
22 Dec 2024
Towards Unifying Interpretability and Control: Evaluation via Intervention
Towards Unifying Interpretability and Control: Evaluation via Intervention
Usha Bhalla
Suraj Srinivas
Asma Ghandeharioun
Himabindu Lakkaraju
47
5
0
07 Nov 2024
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Arithmetic Without Algorithms: Language Models Solve Math With a Bag of Heuristics
Yaniv Nikankin
Anja Reusch
Aaron Mueller
Yonatan Belinkov
AIFin
LRM
43
25
0
28 Oct 2024
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Fact Recall, Heuristics or Pure Guesswork? Precise Interpretations of Language Models for Fact Completion
Denitsa Saynova
Lovisa Hagström
Moa Johansson
Richard Johansson
Marco Kuhlmann
HILM
46
0
0
18 Oct 2024
On the Role of Attention Heads in Large Language Model Safety
On the Role of Attention Heads in Large Language Model Safety
Zhenhong Zhou
Haiyang Yu
Xinghua Zhang
Rongwu Xu
Fei Huang
Kun Wang
Yang Liu
Sihang Li
Yongbin Li
59
5
0
17 Oct 2024
Inference and Verbalization Functions During In-Context Learning
Inference and Verbalization Functions During In-Context Learning
Junyi Tao
Xiaoyin Chen
Nelson F. Liu
LRM
ReLM
26
1
0
12 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Racing Thoughts: Explaining Contextualization Errors in Large Language Models
Michael A. Lepori
Michael Mozer
Asma Ghandeharioun
LRM
87
1
0
02 Oct 2024
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
On Behalf of the Stakeholders: Trends in NLP Model Interpretability in the Era of LLMs
Nitay Calderon
Roi Reichart
47
13
0
27 Jul 2024
Representing Rule-based Chatbots with Transformers
Representing Rule-based Chatbots with Transformers
Dan Friedman
Abhishek Panigrahi
Danqi Chen
71
1
0
15 Jul 2024
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
A Practical Review of Mechanistic Interpretability for Transformer-Based Language Models
Daking Rai
Yilun Zhou
Shi Feng
Abulhair Saparov
Ziyu Yao
85
22
0
02 Jul 2024
Monitoring Latent World States in Language Models with Propositional
  Probes
Monitoring Latent World States in Language Models with Propositional Probes
Jiahai Feng
Stuart Russell
Jacob Steinhardt
HILM
48
8
0
27 Jun 2024
Learned feature representations are biased by complexity, learning
  order, position, and more
Learned feature representations are biased by complexity, learning order, position, and more
Andrew Kyle Lampinen
Stephanie C. Y. Chan
Katherine Hermann
AI4CE
FaML
SSL
OOD
45
6
0
09 May 2024
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Sparse Feature Circuits: Discovering and Editing Interpretable Causal Graphs in Language Models
Samuel Marks
Can Rager
Eric J. Michaud
Yonatan Belinkov
David Bau
Aaron Mueller
53
122
0
28 Mar 2024
Patchscopes: A Unifying Framework for Inspecting Hidden Representations
  of Language Models
Patchscopes: A Unifying Framework for Inspecting Hidden Representations of Language Models
Asma Ghandeharioun
Avi Caciularu
Adam Pearce
Lucas Dixon
Mor Geva
39
90
0
11 Jan 2024
Interventionally Consistent Surrogates for Agent-based Simulators
Interventionally Consistent Surrogates for Agent-based Simulators
Joel Dyer
Nicholas Bishop
Yorgos Felekis
Fabio Massimo Zennaro
Anisoara Calinescu
Theodoros Damoulas
Michael Wooldridge
19
6
0
18 Dec 2023
Measuring and Improving Attentiveness to Partial Inputs with
  Counterfactuals
Measuring and Improving Attentiveness to Partial Inputs with Counterfactuals
Yanai Elazar
Bhargavi Paranjape
Hao Peng
Sarah Wiegreffe
Khyathi Raghavi
Vivek Srikumar
Sameer Singh
Noah A. Smith
AAML
OOD
34
0
0
16 Nov 2023
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
78
7
0
07 Nov 2023
Codebook Features: Sparse and Discrete Interpretability for Neural
  Networks
Codebook Features: Sparse and Discrete Interpretability for Neural Networks
Alex Tamkin
Mohammad Taufeeque
Noah D. Goodman
40
27
0
26 Oct 2023
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers
Anna Langedijk
Hosein Mohebbi
Gabriele Sarti
Willem H. Zuidema
Jaap Jumelet
32
10
0
05 Oct 2023
Towards Best Practices of Activation Patching in Language Models:
  Metrics and Methods
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
41
101
0
27 Sep 2023
Sparse Autoencoders Find Highly Interpretable Features in Language
  Models
Sparse Autoencoders Find Highly Interpretable Features in Language Models
Hoagy Cunningham
Aidan Ewart
Logan Riggs
R. Huben
Lee Sharkey
MILM
33
347
0
15 Sep 2023
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing
  Tool for BLIP
Towards Vision-Language Mechanistic Interpretability: A Causal Tracing Tool for BLIP
Vedant Palit
Rohan Pandey
Aryaman Arora
Paul Pu Liang
34
20
0
27 Aug 2023
Arithmetic with Language Models: from Memorization to Computation
Arithmetic with Language Models: from Memorization to Computation
Davide Maltoni
Matteo Ferrara
KELM
LRM
47
5
0
02 Aug 2023
A Geometric Notion of Causal Probing
A Geometric Notion of Causal Probing
Clément Guerner
Anej Svete
Tianyu Liu
Alex Warstadt
Ryan Cotterell
LLMSV
41
12
0
27 Jul 2023
Causal interventions expose implicit situation models for commonsense
  language understanding
Causal interventions expose implicit situation models for commonsense language understanding
Takateru Yamakoshi
James L. McClelland
A. Goldberg
Robert D. Hawkins
34
6
0
06 Jun 2023
Causal Analysis for Robust Interpretability of Neural Networks
Causal Analysis for Robust Interpretability of Neural Networks
Ola Ahmad
Nicolas Béreux
Loïc Baret
V. Hashemi
Freddy Lecue
CML
34
3
0
15 May 2023
Computational modeling of semantic change
Computational modeling of semantic change
Nina Tahmasebi
Haim Dubossarsky
40
6
0
13 Apr 2023
Localizing Model Behavior with Path Patching
Localizing Model Behavior with Path Patching
Nicholas W. Goldowsky-Dill
Chris MacLeod
L. Sato
Aryaman Arora
42
85
0
12 Apr 2023
A Discerning Several Thousand Judgments: GPT-3 Rates the Article +
  Adjective + Numeral + Noun Construction
A Discerning Several Thousand Judgments: GPT-3 Rates the Article + Adjective + Numeral + Noun Construction
Kyle Mahowald
24
24
0
29 Jan 2023
Inducing Character-level Structure in Subword-based Language Models with
  Type-level Interchange Intervention Training
Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training
Jing-ling Huang
Zhengxuan Wu
Kyle Mahowald
Christopher Potts
29
13
0
19 Dec 2022
Causal Abstraction with Soft Interventions
Causal Abstraction with Soft Interventions
Riccardo Massidda
Atticus Geiger
Thomas Icard
D. Bacciu
28
13
0
22 Nov 2022
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
218
515
0
01 Nov 2022
Causal Proxy Models for Concept-Based Model Explanations
Causal Proxy Models for Concept-Based Model Explanations
Zhengxuan Wu
Karel DÓosterlinck
Atticus Geiger
Amir Zur
Christopher Potts
MILM
83
35
0
28 Sep 2022
FACT: Learning Governing Abstractions Behind Integer Sequences
FACT: Learning Governing Abstractions Behind Integer Sequences
Peter Belcak
Ard Kastrati
Flavio Schenker
Roger Wattenhofer
43
5
0
20 Sep 2022
Leveraging Explanations in Interactive Machine Learning: An Overview
Leveraging Explanations in Interactive Machine Learning: An Overview
Stefano Teso
Öznur Alkan
Wolfgang Stammer
Elizabeth M. Daly
XAI
FAtt
LRM
28
62
0
29 Jul 2022
12
Next