ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2401.12631
  4. Cited By
A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments

23 January 2024
Zhengxuan Wu
Atticus Geiger
Jing-ling Huang
Aryaman Arora
Thomas F. Icard
Christopher Potts
Noah D. Goodman
ArXivPDFHTML

Papers citing "A Reply to Makelov et al. (2023)'s "Interpretability Illusion" Arguments"

2 / 2 papers shown
Title
Finding Alignments Between Interpretable Causal Variables and
  Distributed Neural Representations
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations
Atticus Geiger
Zhengxuan Wu
Christopher Potts
Thomas F. Icard
Noah D. Goodman
CML
73
98
0
05 Mar 2023
Interpretability in the Wild: a Circuit for Indirect Object
  Identification in GPT-2 small
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
212
494
0
01 Nov 2022
1