ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.13793
  4. Cited By
Evaluating Frontier Models for Dangerous Capabilities
v1v2 (latest)

Evaluating Frontier Models for Dangerous Capabilities

20 March 2024
Mary Phuong
Matthew Aitchison
Elliot Catt
Sarah Cogan
Alex Kaskasoli
Victoria Krakovna
David Lindner
Matthew Rahtz
Yannis Assael
Sarah Hodkinson
Heidi Howard
Tom Lieberum
Ramana Kumar
Maria Abi Raad
Albert Webson
Lewis Ho
Sharon Lin
Sebastian Farquhar
Marcus Hutter
Grégoire Delétang
Anian Ruoss
Seliem El-Sayed
Sasha Brown
Anca Dragan
Rohin Shah
Allan Dafoe
Toby Shevlane
    ELM
ArXiv (abs)PDFHTML

Papers citing "Evaluating Frontier Models for Dangerous Capabilities"

7 / 7 papers shown
Title
ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
ACSE-Eval: Can LLMs threat model real-world cloud infrastructure?
Sarthak Munshi
Swapnil Pathak
Sonam Ghatode
Thenuga Priyadarshini
Dhivya Chandramouleeswaran
Ashutosh Rana
ELM
154
0
0
16 May 2025
Real-World Gaps in AI Governance Research
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
126
1
0
30 Apr 2025
Measuring AI Ability to Complete Long Tasks
Measuring AI Ability to Complete Long Tasks
Thomas Kwa
Ben West
Joel Becker
Amy Deng
Katharyn Garcia
...
Lucas Jun Koba Sato
H. Wijk
Daniel M. Ziegler
Elizabeth Barnes
Lawrence Chan
ELM
228
16
0
18 Mar 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
117
8
0
14 Mar 2025
Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation
Malcolm Murray
Henry Papadatos
Otter Quarks
Pierre-François Gimenez
Simeon Campos
100
1
0
06 Mar 2025
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Surgical, Cheap, and Flexible: Mitigating False Refusal in Language Models via Single Vector Ablation
Xinpeng Wang
Chengzhi Hu
Paul Röttger
Barbara Plank
128
10
0
04 Oct 2024
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
AI Sandbagging: Language Models can Strategically Underperform on Evaluations
Teun van der Weij
Felix Hofstätter
Ollie Jaffe
Samuel F. Brown
Francis Rhys Ward
ELM
76
29
0
11 Jun 2024
1