Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.01605
Cited By
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models
2 August 2024
Shengye Wan
Cyrus Nikolaidis
Daniel Song
David Molnar
James Crnkovich
Jayson Grace
Manish P Bhatt
Sahana Chennabasappa
Spencer Whitman
Stephanie Ding
Vlad Ionescu
Yue Li
Joshua Saxe
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models"
17 / 17 papers shown
Title
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
28
0
0
12 May 2025
RedTeamLLM: an Agentic AI framework for offensive security
Brian Challita
Pierre Parrend
LLMAG
50
0
0
11 May 2025
Security Steerability is All You Need
Itay Hazan
Idan Habler
Ron Bitton
Itsik Mantin
AAML
80
0
0
28 Apr 2025
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
Activated LoRA: Fine-tuned LLMs for Intrinsics
Kristjan Greenewald
Luis A. Lastras
Thomas Parnell
Vraj Shah
Lucian Popa
Giulio Zizzo
Chulaka Gunasekara
Ambrish Rawat
David D. Cox
27
0
0
16 Apr 2025
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
A. Happe
Jürgen Cito
22
0
0
14 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
52
2
0
07 Apr 2025
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
Gil Gekker
Meirav Segal
Dan Lahav
Omer Nevo
ELM
50
0
0
30 Mar 2025
SandboxEval: Towards Securing Test Environment for Untrusted Code
Rafiqul Rabin
Jesse Hostetler
Sean McGregor
Brett Weir
Nick Judd
ELM
41
0
0
27 Mar 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
69
5
0
14 Mar 2025
Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation
Malcolm Murray
Henry Papadatos
Otter Quarks
Pierre-François Gimenez
Simeon Campos
57
1
0
06 Mar 2025
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
Michael Kouremetis
Marissa Dotter
Alex Byrne
Dan Martin
Ethan Michalak
Gianpaolo Russo
Michael Threet
Guido Zarrella
ELM
52
7
0
18 Feb 2025
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations
Ziyang Ye
T. H. Le
Muhammad Ali Babar
83
0
0
04 Feb 2025
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
36
2
0
21 Oct 2024
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Yu Yang
Yuzhou Nie
Zhun Wang
Yuheng Tang
Wenbo Guo
Bo Li
D. Song
ELM
38
6
0
14 Oct 2024
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models
Fatma Yasmine Loumachi
Mohamed Chahine Ghanem
AI4CE
40
1
0
04 Sep 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang
R. Bindu
Akul Gupta
Daniel Kang
SILM
LLMAG
78
55
0
11 Apr 2024
1