Real Sparks of Artificial Intelligence and the Importance of Inner Interpretability

31 January 2024

Papers citing "Real Sparks of Artificial Intelligence and the Importance of Inner Interpretability"

2 / 2 papers shown

Title
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 497 0 01 Nov 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021