Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.12161
Cited By
Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent
16 July 2024
Karolis Jucys
George Adamopoulos
Mehrab Hamidi
Stephanie Milani
Mohammad Reza Samsami
Artem Zholus
Sonia Joseph
Blake A. Richards
Irina Rish
Özgür Simsek
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Interpretability in Action: Exploratory Analysis of VPT, a Minecraft Agent"
13 / 13 papers shown
Title
How to use and interpret activation patching
Stefan Heimersheim
Neel Nanda
73
47
0
23 Apr 2024
Explaining Reinforcement Learning with Shapley Values
Daniel Beechey
Thomas M. S. Smith
Özgür Simsek
TDI
FAtt
50
18
0
09 Jun 2023
Towards Automated Circuit Discovery for Mechanistic Interpretability
Arthur Conmy
Augustine N. Mavor-Parker
Aengus Lynch
Stefan Heimersheim
Adrià Garriga-Alonso
64
318
0
28 Apr 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small
Kevin Wang
Alexandre Variengien
Arthur Conmy
Buck Shlegeris
Jacob Steinhardt
310
559
0
01 Nov 2022
On Feature Learning in the Presence of Spurious Correlations
Pavel Izmailov
Polina Kirichenko
Nate Gruver
A. Wilson
103
129
0
20 Oct 2022
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
319
525
0
24 Sep 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Bowen Baker
Ilge Akkaya
Peter Zhokhov
Joost Huizinga
Jie Tang
Adrien Ecoffet
Brandon Houghton
Raul Sampedro
Jeff Clune
OffRL
130
303
0
23 Jun 2022
Underspecification Presents Challenges for Credibility in Modern Machine Learning
Alexander DÁmour
Katherine A. Heller
D. Moldovan
Ben Adlam
B. Alipanahi
...
Kellie Webster
Steve Yadlowsky
T. Yun
Xiaohua Zhai
D. Sculley
OffRL
120
688
0
06 Nov 2020
Explainable Reinforcement Learning: A Survey
Erika Puiutta
Eric M. S. P. Veith
XAI
71
248
0
13 May 2020
MineRL: A Large-Scale Dataset of Minecraft Demonstrations
William H. Guss
Brandon Houghton
Nicholay Topin
Phillip Wang
Cayden R. Codel
Manuela Veloso
Ruslan Salakhutdinov
OffRL
68
227
0
29 Jul 2019
Sanity Checks for Saliency Maps
Julius Adebayo
Justin Gilmer
M. Muelly
Ian Goodfellow
Moritz Hardt
Been Kim
FAtt
AAML
XAI
141
1,969
0
08 Oct 2018
"Why Should I Trust You?": Explaining the Predictions of Any Classifier
Marco Tulio Ribeiro
Sameer Singh
Carlos Guestrin
FAtt
FaML
1.2K
17,027
0
16 Feb 2016
Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
Karen Simonyan
Andrea Vedaldi
Andrew Zisserman
FAtt
314
7,316
0
20 Dec 2013
1