Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.15257
Cited By
STARC: A General Framework For Quantifying Differences Between Reward Functions
26 September 2023
Joar Skalse
Lucy Farnik
S. Motwani
Erik Jenner
Adam Gleave
Alessandro Abate
Re-assign community
ArXiv
PDF
HTML
Papers citing
"STARC: A General Framework For Quantifying Differences Between Reward Functions"
9 / 9 papers shown
Title
Subtask-Aware Visual Reward Learning from Segmented Demonstrations
Changyeon Kim
Minho Heo
Doohyun Lee
Jinwoo Shin
Honglak Lee
Joseph J. Lim
Kimin Lee
44
1
0
28 Feb 2025
Rethinking Reward Model Evaluation: Are We Barking up the Wrong Tree?
Xueru Wen
Jie Lou
Yaojie Lu
Hongyu Lin
Xing Yu
Xinyu Lu
Xianpei Han
Xianpei Han
Debing Zhang
Le Sun
ALM
61
4
0
17 Feb 2025
On the Partial Identifiability in Reward Learning: Choosing the Best Reward
Filippo Lazzati
Alberto Maria Metelli
38
0
0
10 Jan 2025
Can a Bayesian Oracle Prevent Harm from an Agent?
Yoshua Bengio
Michael K. Cohen
Nikolay Malkin
Matt MacDermott
Damiano Fornasiere
Pietro Greiner
Younesse Kaddar
45
4
0
09 Aug 2024
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
David Dalrymple
Joar Skalse
Yoshua Bengio
Stuart J. Russell
Max Tegmark
...
Clark Barrett
Ding Zhao
Zhi-Xuan Tan
Jeannette Wing
Joshua Tenenbaum
52
52
0
10 May 2024
Quantifying the Sensitivity of Inverse Reinforcement Learning to Misspecification
Joar Skalse
Alessandro Abate
28
3
0
11 Mar 2024
Cooperation and Control in Delegation Games
Oliver Sourbut
Lewis Hammond
Harriet Wood
14
0
0
24 Feb 2024
Goodhart's Law in Reinforcement Learning
Jacek Karwowski
Oliver Hayman
Xingjian Bai
Klaus Kiendlhofer
Charlie Griffin
Joar Skalse
34
9
0
13 Oct 2023
Defining and Characterizing Reward Hacking
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
59
55
0
27 Sep 2022
1