Opening the AI black box: program synthesis via mechanistic interpretability

7 February 2024

Papers citing "Opening the AI black box: program synthesis via mechanistic interpretability"

8 / 8 papers shown

Title
Attribution Patching Outperforms Automated Circuit Discovery Aaquib Syed Can Rager Arthur Conmy 125 65 0 16 Oct 2023
Provably safe systems: the only path to controllable AGI Max Tegmark Steve Omohundro 61 23 0 05 Sep 2023
Learning the greatest common divisor: explaining transformer predictions Franccois Charton 43 18 0 29 Aug 2023
The Clock and the Pizza: Two Stories in Mechanistic Explanation of Neural Networks Ziqian Zhong Ziming Liu Max Tegmark Jacob Andreas 65 100 0 30 Jun 2023
Discovering Latent Knowledge in Language Models Without Supervision Collin Burns Haotian Ye Dan Klein Jacob Steinhardt 122 370 0 07 Dec 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 314 514 0 24 Sep 2022
Acquisition of Chess Knowledge in AlphaZero Thomas McGrath A. Kapishnikov Nenad Tomašev Adam Pearce Demis Hassabis Been Kim Ulrich Paquet Vladimir Kramnik 55 164 0 17 Nov 2021
AI Feynman 2.0: Pareto-optimal symbolic regression exploiting graph modularity S. Udrescu A. Tan Jiahai Feng Orisvaldo Neto Tailin Wu Max Tegmark 65 191 0 18 Jun 2020