Value Alignment Verification

2 December 2020

Papers citing "Value Alignment Verification"

5 / 5 papers shown

Title
High-Dimension Human Value Representation in Large Language Models Samuel Cahyawijaya Delong Chen Yejin Bang Leila Khalatbari Bryan Wilie Ziwei Ji Etsuko Ishii Pascale Fung 71 5 0 11 Apr 2024
Effect of Adapting to Human Preferences on Trust in Human-Robot Teaming Shreyas Bhat Joseph B. Lyons Cong Shi X. J. Yang 20 3 0 11 Sep 2023
Defining and Characterizing Reward Hacking Joar Skalse Nikolaus H. R. Howe Dmitrii Krasheninnikov David M. Krueger 59 56 0 27 Sep 2022
Negotiating Team Formation Using Deep Reinforcement Learning Yoram Bachrach Richard Everett Edward Hughes Angeliki Lazaridou Joel Z Leibo Marc Lanctot Michael Bradley Johanson Wojciech M. Czarnecki T. Graepel 43 35 0 20 Oct 2020
Decoupling Representation Learning from Reinforcement Learning Adam Stooke Kimin Lee Pieter Abbeel Michael Laskin SSL DRL 284 341 0 14 Sep 2020