Experiments with Detecting and Mitigating AI Deception

26 June 2023

Papers citing "Experiments with Detecting and Mitigating AI Deception"

2 / 2 papers shown

Title
Honesty Is the Best Policy: Defining and Mitigating AI Deception Francis Rhys Ward Francesco Belardinelli Francesca Toni Tom Everitt 110 27 0 03 Dec 2023
SHAPE: A Framework for Evaluating the Ethicality of Influence Elfia Bezou-Vrakatseli Benedikt Brückner Luke Thorburn TDI 29 3 0 08 Sep 2023