ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.15116
  4. Cited By
Evaluating Stability of Unreflective Alignment

Evaluating Stability of Unreflective Alignment

27 August 2024
James Lucassen
Mark Henry
Philippa Wright
Owen Yeung
ArXiv (abs)PDFHTML

Papers citing "Evaluating Stability of Unreflective Alignment"

5 / 5 papers shown
Title
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
The Shutdown Problem: An AI Engineering Puzzle for Decision Theorists
Elliott Thornley
65
9
0
07 Mar 2024
LLMArena: Assessing Capabilities of Large Language Models in Dynamic
  Multi-Agent Environments
LLMArena: Assessing Capabilities of Large Language Models in Dynamic Multi-Agent Environments
Junzhe Chen
Xuming Hu
Shuodi Liu
Shiyu Huang
Weijuan Tu
Zhaofeng He
Lijie Wen
ELMLLMAG
80
11
0
26 Feb 2024
Is Power-Seeking AI an Existential Risk?
Is Power-Seeking AI an Existential Risk?
Joseph Carlsmith
ELM
62
87
0
16 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
880
12,973
0
04 Mar 2022
Risks from Learned Optimization in Advanced Machine Learning Systems
Risks from Learned Optimization in Advanced Machine Learning Systems
Evan Hubinger
Chris van Merwijk
Vladimir Mikulik
Joar Skalse
Scott Garrabrant
89
152
0
05 Jun 2019
1