Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.13793
Cited By
v1
v2 (latest)
Evaluating Frontier Models for Dangerous Capabilities
20 March 2024
Mary Phuong
Matthew Aitchison
Elliot Catt
Sarah Cogan
Alex Kaskasoli
Victoria Krakovna
David Lindner
Matthew Rahtz
Yannis Assael
Sarah Hodkinson
Heidi Howard
Tom Lieberum
Ramana Kumar
Maria Abi Raad
Albert Webson
Lewis Ho
Sharon Lin
Sebastian Farquhar
Marcus Hutter
Grégoire Delétang
Anian Ruoss
Seliem El-Sayed
Sasha Brown
Anca Dragan
Rohin Shah
Allan Dafoe
Toby Shevlane
ELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Evaluating Frontier Models for Dangerous Capabilities"
7 / 7 papers shown
Title
ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
Sarthak Munshi
Swapnil Pathak
Sonam Ghatode
Thenuga Priyadarshini
Dhivya Chandramouleeswaran
Ashutosh Rana
ELM
154
0
0
16 May 2025
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
126
1
0
30 Apr 2025
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
228
16
0
18 Mar 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
117
8
0
14 Mar 2025
Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation
Malcolm Murray
Henry Papadatos
Otter Quarks
Pierre-François Gimenez
Simeon Campos
100
1
0
06 Mar 2025
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
128
10
0
04 Oct 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
76
29
0
11 Jun 2024
1