Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.13477
Cited By
Parametrically Retargetable Decision-Makers Tend To Seek Power
27 June 2022
Alexander Matt Turner
Prasad Tadepalli
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Parametrically Retargetable Decision-Makers Tend To Seek Power"
11 / 11 papers shown
Title
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
44
9
0
29 Oct 2024
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
38
0
0
30 Jun 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
32
7
0
18 Mar 2024
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
20
6
0
27 Oct 2023
AI Systems of Concern
Kayla Matteucci
S. Avin
Fazl Barez
Seán Ó hÉigeartaigh
11
1
0
09 Oct 2023
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
176
0
26 Sep 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
47
472
0
27 Jul 2023
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
C. Mitelut
Ben Smith
Peter Vamplew
16
3
0
30 May 2023
Power-seeking can be probable and predictive for trained agents
Victoria Krakovna
János Kramár
TDI
27
16
0
13 Apr 2023
Eight Things to Know about Large Language Models
Sam Bowman
ALM
25
113
0
02 Apr 2023
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
54
183
0
30 Aug 2022
1