Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.04235
Cited By
Definitions of intent suitable for algorithms
8 June 2021
Hal Ashton
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Definitions of intent suitable for algorithms"
9 / 9 papers shown
Title
Evaluating Language Model Character Traits
Francis Rhys Ward
Zejia Yang
Alex Jackson
Randy Brown
Chandler Smith
Grace Colverd
Louis Thomson
Raymond Douglas
Patrik Bartak
Andrew Rowan
47
0
0
05 Oct 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
52
22
0
11 Jun 2024
The Reasons that Agents Act: Intention and Instrumental Goals
Francis Rhys Ward
Matt MacDermott
Francesco Belardinelli
Francesca Toni
Tom Everitt
AI4CE
29
12
0
11 Feb 2024
Honesty Is the Best Policy: Defining and Mitigating AI Deception
Francis Rhys Ward
Francesco Belardinelli
Francesca Toni
Tom Everitt
112
27
0
03 Dec 2023
SHAPE: A Framework for Evaluating the Ethicality of Influence
Elfia Bezou-Vrakatseli
Benedikt Brückner
Luke Thorburn
TDI
34
3
0
08 Sep 2023
Experiments with Detecting and Mitigating AI Deception
Ismail Sahbane
Francis Rhys Ward
Henrik ˚Aslund
23
1
0
26 Jun 2023
Human Control: Definitions and Algorithms
Ryan Carey
Tom Everitt
32
6
0
31 May 2023
What is Proxy Discrimination?
Michael Carl Tschantz
6
18
0
11 May 2022
Extending counterfactual accounts of intent to include oblique intent
Hal Ashton
13
3
0
07 Jun 2021
1