Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.05259
Cited By
How to evaluate control measures for LLM agents? A trajectory from today to superintelligence
7 April 2025
Tomek Korbak
Mikita Balesni
Buck Shlegeris
Geoffrey Irving
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"How to evaluate control measures for LLM agents? A trajectory from today to superintelligence"
5 / 5 papers shown
Title
An Approach to Technical AGI Safety and Security
Rohin Shah
Alex Irpan
Alexander Matt Turner
Anna Wang
Arthur Conmy
...
Shane Legg
Noah D. Goodman
Allan Dafoe
Four Flynn
Anca Dragan
83
9
0
02 Apr 2025
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
286
18
0
18 Mar 2025
Monitoring Reasoning Models for Misbehavior and the Risks of Promoting Obfuscation
Bowen Baker
Joost Huizinga
Leo Gao
Zehao Dou
M. Guan
Aleksander Mądry
Wojciech Zaremba
J. Pachocki
David Farhi
LRM
186
38
0
14 Mar 2025
A sketch of an AI control safety case
Tomek Korbak
Joshua Clymer
Benjamin Hilton
Buck Shlegeris
Geoffrey Irving
149
10
0
28 Jan 2025
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
89
31
0
11 Jun 2024
1