Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1712.06365
Cited By
v1
v2
v3
v4 (latest)
Índifference' methods for managing agent rewards
18 December 2017
Stuart Armstrong
Xavier O'Rourke
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Índifference' methods for managing agent rewards"
13 / 13 papers shown
Title
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
120
0
0
30 Jun 2024
AGI Safety Literature Review
Tom Everitt
G. Lea
Marcus Hutter
AI4CE
76
116
0
03 May 2018
Counterfactual equivalence for POMDPs, and underlying deterministic environments
Stuart Armstrong
47
2
0
11 Jan 2018
AI Safety Gridworlds
Jan Leike
Miljan Martic
Victoria Krakovna
Pedro A. Ortega
Tom Everitt
Andrew Lefrancq
Laurent Orseau
Shane Legg
138
255
0
27 Nov 2017
Good and safe uses of AI Oracles
Stuart Armstrong
Xavier O'Rorke
138
27
0
15 Nov 2017
Inverse Reward Design
Dylan Hadfield-Menell
S. Milli
Pieter Abbeel
Stuart J. Russell
Anca Dragan
93
400
0
08 Nov 2017
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
220
3,380
0
12 Jun 2017
Should Robots be Obedient?
S. Milli
Dylan Hadfield-Menell
Anca Dragan
Stuart J. Russell
69
59
0
28 May 2017
Enter the Matrix: Safely Interruptible Autonomous Systems via Virtualization
Mark O. Riedl
Brent Harrison
47
7
0
30 Mar 2017
Concrete Problems in AI Safety
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
260
2,405
0
21 Jun 2016
Self-Modification of Policy and Utility Function in Rational Agents
Tom Everitt
Daniel Filan
Mayank Daswani
Marcus Hutter
59
29
0
10 May 2016
Learning the Preferences of Ignorant, Inconsistent Agents
Owain Evans
Andreas Stuhlmuller
Noah D. Goodman
78
128
0
18 Dec 2015
Can Intelligence Explode?
Marcus Hutter
ELM
66
29
0
28 Feb 2012
1