ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2206.13477
  4. Cited By
Parametrically Retargetable Decision-Makers Tend To Seek Power

Parametrically Retargetable Decision-Makers Tend To Seek Power

27 June 2022
Alexander Matt Turner
Prasad Tadepalli
ArXivPDFHTML

Papers citing "Parametrically Retargetable Decision-Makers Tend To Seek Power"

11 / 11 papers shown
Title
Towards evaluations-based safety cases for AI scheming
Towards evaluations-based safety cases for AI scheming
Mikita Balesni
Marius Hobbhahn
David Lindner
Alexander Meinke
Tomek Korbak
...
Dan Braun
Bilal Chughtai
Owain Evans
Daniel Kokotajlo
Lucius Bushnaq
ELM
44
9
0
29 Oct 2024
Towards shutdownable agents via stochastic choice
Towards shutdownable agents via stochastic choice
Elliott Thornley
Alexander Roman
Christos Ziakas
Leyton Ho
Louis Thomson
38
0
0
30 Jun 2024
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and
  Safety
OpenEval: Benchmarking Chinese LLMs across Capability, Alignment and Safety
Chuang Liu
Linhao Yu
Jiaxuan Li
Renren Jin
Yufei Huang
...
Tao Liu
Jinwang Song
Hongying Zan
Sun Li
Deyi Xiong
ELM
32
7
0
18 Mar 2024
A Review of the Evidence for Existential Risk from AI via Misaligned
  Power-Seeking
A Review of the Evidence for Existential Risk from AI via Misaligned Power-Seeking
Rose Hadshar
20
6
0
27 Oct 2023
AI Systems of Concern
AI Systems of Concern
Kayla Matteucci
S. Avin
Fazl Barez
Seán Ó hÉigeartaigh
11
1
0
09 Oct 2023
Large Language Model Alignment: A Survey
Large Language Model Alignment: A Survey
Tianhao Shen
Renren Jin
Yufei Huang
Chuang Liu
Weilong Dong
Zishan Guo
Xinwei Wu
Yan Liu
Deyi Xiong
LM&MA
19
176
0
26 Sep 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
47
472
0
27 Jul 2023
Intent-aligned AI systems deplete human agency: the need for agency
  foundations research in AI safety
Intent-aligned AI systems deplete human agency: the need for agency foundations research in AI safety
C. Mitelut
Ben Smith
Peter Vamplew
16
3
0
30 May 2023
Power-seeking can be probable and predictive for trained agents
Power-seeking can be probable and predictive for trained agents
Victoria Krakovna
János Kramár
TDI
27
16
0
13 Apr 2023
Eight Things to Know about Large Language Models
Eight Things to Know about Large Language Models
Sam Bowman
ALM
25
113
0
02 Apr 2023
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning Perspective
Richard Ngo
Lawrence Chan
Sören Mindermann
54
183
0
30 Aug 2022
1