Interpretability in Action: Exploratory Analysis of VPT, a Minecraft
Agent

Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent

16 July 2024

George Adamopoulos

Stephanie Milani

Mohammad Reza Samsami

Blake A. Richards

ArXiv (abs)PDF HTML

Papers citing "Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent"

13 / 13 papers shown

Title
How to use and interpret activation patching Stefan Heimersheim Neel Nanda 73 47 0 23 Apr 2024
Explaining Reinforcement Learning with Shapley Values Daniel Beechey Thomas M. S. Smith Özgür Simsek TDI FAtt 50 18 0 09 Jun 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability Arthur Conmy Augustine N. Mavor-Parker Aengus Lynch Stefan Heimersheim Adrià Garriga-Alonso 64 318 0 28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 310 559 0 01 Nov 2022
On Feature Learning in the Presence of Spurious Correlations Pavel Izmailov Polina Kirichenko Nate Gruver A. Wilson 103 129 0 20 Oct 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 319 525 0 24 Sep 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos Bowen Baker Ilge Akkaya Peter Zhokhov Joost Huizinga Jie Tang Adrien Ecoffet Brandon Houghton Raul Sampedro Jeff Clune OffRL 130 303 0 23 Jun 2022
Underspecification Presents Challenges for Credibility in Modern Machine Learning Alexander DÁmour Katherine A. Heller D. Moldovan Ben Adlam B. Alipanahi ... Kellie Webster Steve Yadlowsky T. Yun Xiaohua Zhai D. Sculley OffRL 120 688 0 06 Nov 2020
Explainable Reinforcement Learning: A Survey Erika Puiutta Eric M. S. P. Veith XAI 71 248 0 13 May 2020
MineRL: A Large-Scale Dataset of Minecraft Demonstrations William H. Guss Brandon Houghton Nicholay Topin Phillip Wang Cayden R. Codel Manuela Veloso Ruslan Salakhutdinov OffRL 68 227 0 29 Jul 2019
Sanity Checks for Saliency Maps Julius Adebayo Justin Gilmer M. Muelly Ian Goodfellow Moritz Hardt Been Kim FAtt AAML XAI 141 1,969 0 08 Oct 2018
"Why Should I Trust You?": Explaining the Predictions of Any Classifier Marco Tulio Ribeiro Sameer Singh Carlos Guestrin FAtt FaML 1.2K 17,027 0 16 Feb 2016
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps Karen Simonyan Andrea Vedaldi Andrew Zisserman FAtt 314 7,316 0 20 Dec 2013