Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1912.05652
Cited By
Learning Human Objectives by Evaluating Hypothetical Behavior
5 December 2019
S. Reddy
Anca Dragan
Sergey Levine
Shane Legg
Jan Leike
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning Human Objectives by Evaluating Hypothetical Behavior"
21 / 21 papers shown
Title
Learning Interpretable Models of Aircraft Handling Behaviour by Reinforcement Learning from Human Feedback
Tom Bewley
J. Lawry
Arthur G. Richards
30
1
0
26 May 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
35
127
0
06 Apr 2023
A Human-Centered Safe Robot Reinforcement Learning Framework with Interactive Behaviors
Shangding Gu
Alap Kshirsagar
Yali Du
Guang Chen
Jan Peters
Alois C. Knoll
36
14
0
25 Feb 2023
On The Fragility of Learned Reward Functions
Lev McKinney
Yawen Duan
David M. Krueger
Adam Gleave
33
20
0
09 Jan 2023
Benchmarks and Algorithms for Offline Preference-Based Reward Learning
Daniel Shin
Anca Dragan
Daniel S. Brown
OffRL
17
53
0
03 Jan 2023
Time-Efficient Reward Learning via Visually Assisted Cluster Ranking
David Zhang
Micah Carroll
Andreea Bobu
Anca Dragan
26
4
0
30 Nov 2022
Reward Learning with Trees: Methods and Evaluation
Tom Bewley
J. Lawry
Arthur G. Richards
R. Craddock
Ian Henderson
23
1
0
03 Oct 2022
Law Informs Code: A Legal Informatics Approach to Aligning Artificial Intelligence with Humans
John J. Nay
ELM
AILaw
88
27
0
14 Sep 2022
Negative Human Rights as a Basis for Long-term AI Safety and Regulation
Ondrej Bajgar
Jan Horenovsky
FaML
24
9
0
31 Aug 2022
Forecasting Future World Events with Neural Networks
Andy Zou
Tristan Xiao
Ryan Jia
Joe Kwon
Mantas Mazeika
Richard Li
Dawn Song
Jacob Steinhardt
Owain Evans
Dan Hendrycks
30
22
0
30 Jun 2022
Aligning to Social Norms and Values in Interactive Narratives
Prithviraj Ammanabrolu
Liwei Jiang
Maarten Sap
Hannaneh Hajishirzi
Yejin Choi
AI4CE
28
47
0
04 May 2022
Safe Deep RL in 3D Environments using Human Feedback
Matthew Rahtz
Vikrant Varma
Ramana Kumar
Zachary Kenton
Shane Legg
Jan Leike
32
4
0
20 Jan 2022
Inducing Structure in Reward Learning by Learning Features
Andreea Bobu
Marius Wiggert
Claire Tomlin
Anca Dragan
27
30
0
18 Jan 2022
On Optimizing Interventions in Shared Autonomy
Weihao Tan
David Koleczek
Siddhant Pradhan
Nicholas Perello
Vivek Chettiar
Vishal Rohra
Aaslesha Rajaram
Soundararajan Srinivasan
H. M. S. Hossain
Yash Chandak
31
4
0
16 Dec 2021
Learning Perceptual Concepts by Bootstrapping from Human Queries
Andreea Bobu
Chris Paxton
Wei Yang
Balakumar Sundaralingam
Yu-Wei Chao
Maya Cakmak
Dieter Fox
SSL
35
17
0
09 Nov 2021
B-Pref: Benchmarking Preference-Based Reinforcement Learning
Kimin Lee
Laura M. Smith
Anca Dragan
Pieter Abbeel
OffRL
42
93
0
04 Nov 2021
Play to Grade: Testing Coding Games as Classifying Markov Decision Process
Allen Nie
Emma Brunskill
Chris Piech
29
11
0
27 Oct 2021
What Would Jiminy Cricket Do? Towards Agents That Behave Morally
Dan Hendrycks
Mantas Mazeika
Andy Zou
Sahil Patel
Christine Zhu
Jesus Navarro
D. Song
Bo-wen Li
Jacob Steinhardt
16
58
0
25 Oct 2021
Avoiding Tampering Incentives in Deep RL via Decoupled Approval
J. Uesato
Ramana Kumar
Victoria Krakovna
Tom Everitt
Richard Ngo
Shane Legg
26
14
0
17 Nov 2020
Feature Expansive Reward Learning: Rethinking Human Input
Andreea Bobu
Marius Wiggert
Claire Tomlin
Anca Dragan
27
44
0
23 Jun 2020
Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles
Balaji Lakshminarayanan
Alexander Pritzel
Charles Blundell
UQCV
BDL
276
5,683
0
05 Dec 2016
1