ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.13594
  4. Cited By
Mechanistically Interpreting a Transformer-based 2-SAT Solver: An
  Axiomatic Approach

Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach

18 July 2024
Nils Palumbo
Ravi Mangal
Zifan Wang
Saranya Vijayakumar
Corina S. Pasareanu
Somesh Jha
ArXivPDFHTML

Papers citing "Mechanistically Interpreting a Transformer-based 2-SAT Solver: An Axiomatic Approach"

26 / 26 papers shown
Title
Towards Combinatorial Interpretability of Neural Computation
Towards Combinatorial Interpretability of Neural Computation
Micah Adler
Dan Alistarh
Nir Shavit
FAtt
359
2
0
10 Apr 2025
How to use and interpret activation patching
How to use and interpret activation patching
Stefan Heimersheim
Neel Nanda
71
46
0
23 Apr 2024
Uncovering Intermediate Variables in Transformers using Circuit Probing
Uncovering Intermediate Variables in Transformers using Circuit Probing
Michael A. Lepori
Thomas Serre
Ellie Pavlick
110
7
0
07 Nov 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of
  Neural Networks
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks
Ziqian Zhong
Ziming Liu
Max Tegmark
Jacob Andreas
69
100
0
30 Jun 2023
Faith and Fate: Limits of Transformers on Compositionality
Faith and Fate: Limits of Transformers on Compositionality
Nouha Dziri
Ximing Lu
Melanie Sclar
Xiang Lorraine Li
Liwei Jian
...
Sean Welleck
Xiang Ren
Allyson Ettinger
Zaïd Harchaoui
Yejin Choi
ReLM
LRM
122
376
0
29 May 2023
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Interpretability at Scale: Identifying Causal Mechanisms in Alpaca
Zhengxuan Wu
Atticus Geiger
Thomas Icard
Christopher Potts
Noah D. Goodman
MILM
75
92
0
15 May 2023
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
128
375
0
07 Dec 2022
Transformers Learn Shortcuts to Automata
Transformers Learn Shortcuts to Automata
Bingbin Liu
Jordan T. Ash
Surbhi Goel
A. Krishnamurthy
Cyril Zhang
OffRL
LRM
126
175
0
19 Oct 2022
Toward Transparent AI: A Survey on Interpreting the Inner Structures of
  Deep Neural Networks
Toward Transparent AI: A Survey on Interpreting the Inner Structures of Deep Neural Networks
Tilman Raukur
A. Ho
Stephen Casper
Dylan Hadfield-Menell
AAML
AI4CE
93
132
0
27 Jul 2022
Probing Classifiers: Promises, Shortcomings, and Advances
Probing Classifiers: Promises, Shortcomings, and Advances
Yonatan Belinkov
269
445
0
24 Feb 2021
Influence Patterns for Explaining Information Flow in BERT
Influence Patterns for Explaining Information Flow in BERT
Kaiji Lu
Zifan Wang
Piotr (Peter) Mardziel
Anupam Datta
GNN
65
16
0
02 Nov 2020
The elephant in the interpretability room: Why use attention as
  explanation when we have saliency methods?
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
Jasmijn Bastings
Katja Filippova
XAI
LRM
95
177
0
12 Oct 2020
Quantifying Attention Flow in Transformers
Quantifying Attention Flow in Transformers
Samira Abnar
Willem H. Zuidema
157
796
0
02 May 2020
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
On Completeness-aware Concept-Based Explanations in Deep Neural Networks
Chih-Kuan Yeh
Been Kim
Sercan O. Arik
Chun-Liang Li
Tomas Pfister
Pradeep Ravikumar
FAtt
228
305
0
17 Oct 2019
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural
  Networks
Score-CAM: Score-Weighted Visual Explanations for Convolutional Neural Networks
Mehdi Neshat
Zifan Wang
Bradley Alexander
Fan Yang
Zijian Zhang
Sirui Ding
Markus Wagner
Xia Hu
FAtt
93
1,069
0
03 Oct 2019
Learning to Deceive with Attention-Based Explanations
Learning to Deceive with Attention-Based Explanations
Danish Pruthi
Mansi Gupta
Bhuwan Dhingra
Graham Neubig
Zachary Chase Lipton
74
193
0
17 Sep 2019
Designing and Interpreting Probes with Control Tasks
Designing and Interpreting Probes with Control Tasks
John Hewitt
Percy Liang
70
536
0
08 Sep 2019
Attention is not not Explanation
Attention is not not Explanation
Sarah Wiegreffe
Yuval Pinter
XAI
AAML
FAtt
120
909
0
13 Aug 2019
BERT Rediscovers the Classical NLP Pipeline
BERT Rediscovers the Classical NLP Pipeline
Ian Tenney
Dipanjan Das
Ellie Pavlick
MILM
SSeg
135
1,471
0
15 May 2019
Attention is not Explanation
Attention is not Explanation
Sarthak Jain
Byron C. Wallace
FAtt
145
1,324
0
26 Feb 2019
RISE: Randomized Input Sampling for Explanation of Black-box Models
RISE: Randomized Input Sampling for Explanation of Black-box Models
Vitali Petsiuk
Abir Das
Kate Saenko
FAtt
181
1,170
0
19 Jun 2018
Interpretability Beyond Feature Attribution: Quantitative Testing with
  Concept Activation Vectors (TCAV)
Interpretability Beyond Feature Attribution: Quantitative Testing with Concept Activation Vectors (TCAV)
Been Kim
Martin Wattenberg
Justin Gilmer
Carrie J. Cai
James Wexler
F. Viégas
Rory Sayres
FAtt
214
1,842
0
30 Nov 2017
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Interpretable Explanations of Black Boxes by Meaningful Perturbation
Ruth C. Fong
Andrea Vedaldi
FAtt
AAML
74
1,520
0
11 Apr 2017
Axiomatic Attribution for Deep Networks
Axiomatic Attribution for Deep Networks
Mukund Sundararajan
Ankur Taly
Qiqi Yan
OOD
FAtt
188
5,989
0
04 Mar 2017
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based
  Localization
Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization
Ramprasaath R. Selvaraju
Michael Cogswell
Abhishek Das
Ramakrishna Vedantam
Devi Parikh
Dhruv Batra
FAtt
297
20,023
0
07 Oct 2016
Deep Inside Convolutional Networks: Visualising Image Classification
  Models and Saliency Maps
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
FAtt
312
7,295
0
20 Dec 2013
1