Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.21514
Cited By
Sabotage Evaluations for Frontier Models
28 October 2024
Joe Benton
Misha Wagner
Eric Christiansen
Cem Anil
Ethan Perez
Jai Srivastav
Esin Durmus
Deep Ganguli
Shauna Kravec
Buck Shlegeris
Jared Kaplan
Holden Karnofsky
Evan Hubinger
Roger C. Grosse
Samuel R. Bowman
David Duvenaud
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Sabotage Evaluations for Frontier Models"
1 / 1 papers shown
Title
Evaluation Faking: Unveiling Observer Effects in Safety Evaluation of Frontier AI Systems
Yihe Fan
Wenqi Zhang
Xudong Pan
Min Yang
75
0
0
23 May 2025
1