Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.15116
Cited By
Evaluating Stability of Unreflective Alignment
27 August 2024
James Lucassen
Mark Henry
Philippa Wright
Owen Yeung
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating Stability of Unreflective Alignment"
5 / 5 papers shown
Title
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley
65
9
0
07 Mar 2024
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
Junzhe Chen
Xuming Hu
Shuodi Liu
Shiyu Huang
Weijuan Tu
Zhaofeng He
Lijie Wen
ELM
LLMAG
80
11
0
26 Feb 2024
Is Power-Seeking AI an Existential Risk?
Joseph Carlsmith
ELM
62
87
0
16 Jun 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
880
12,973
0
04 Mar 2022
Risks from Learned Optimization in Advanced Machine Learning Systems
Evan Hubinger
Chris van Merwijk
Vladimir Mikulik
Joar Skalse
Scott Garrabrant
89
152
0
05 Jun 2019
1