ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2408.01605
  4. Cited By
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and
  Capabilities in Large Language Models

CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models

2 August 2024
Shengye Wan
Cyrus Nikolaidis
Daniel Song
David Molnar
James Crnkovich
Jayson Grace
Manish P Bhatt
Sahana Chennabasappa
Spencer Whitman
Stephanie Ding
Vlad Ionescu
Yue Li
Joshua Saxe
    ELM
ArXivPDFHTML

Papers citing "CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models"

17 / 17 papers shown
Title
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAML
ELM
28
0
0
12 May 2025
RedTeamLLM: an Agentic AI framework for offensive security
RedTeamLLM: an Agentic AI framework for offensive security
Brian Challita
Pierre Parrend
LLMAG
50
0
0
11 May 2025
Security Steerability is All You Need
Security Steerability is All You Need
Itay Hazan
Idan Habler
Ron Bitton
Itsik Mantin
AAML
80
0
0
28 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
80
0
0
21 Apr 2025
Activated LoRA: Fine-tuned LLMs for Intrinsics
Activated LoRA: Fine-tuned LLMs for Intrinsics
Kristjan Greenewald
Luis A. Lastras
Thomas Parnell
Vraj Shah
Lucian Popa
Giulio Zizzo
Chulaka Gunasekara
Ambrish Rawat
David D. Cox
27
0
0
16 Apr 2025
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
A. Happe
Jürgen Cito
22
0
0
14 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape
Frontier AI's Impact on the Cybersecurity Landscape
Wenbo Guo
Yujin Potter
Tianneng Shi
Zhun Wang
Andy Zhang
Dawn Song
52
2
0
07 Apr 2025
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
What Makes an Evaluation Useful? Common Pitfalls and Best Practices
Gil Gekker
Meirav Segal
Dan Lahav
Omer Nevo
ELM
50
0
0
30 Mar 2025
SandboxEval: Towards Securing Test Environment for Untrusted Code
SandboxEval: Towards Securing Test Environment for Untrusted Code
Rafiqul Rabin
Jesse Hostetler
Sean McGregor
Brett Weir
Nick Judd
ELM
41
0
0
27 Mar 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
69
5
0
14 Mar 2025
Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation
Malcolm Murray
Henry Papadatos
Otter Quarks
Pierre-François Gimenez
Simeon Campos
57
1
0
06 Mar 2025
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
Michael Kouremetis
Marissa Dotter
Alex Byrne
Dan Martin
Ethan Michalak
Gianpaolo Russo
Michael Threet
Guido Zarrella
ELM
52
7
0
18 Feb 2025
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations
LLMSecConfig: An LLM-Based Approach for Fixing Software Container Misconfigurations
Ziyang Ye
T. H. Le
Muhammad Ali Babar
83
0
0
04 Feb 2025
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A
  Comparative Analysis
Insights and Current Gaps in Open-Source LLM Vulnerability Scanners: A Comparative Analysis
Jonathan Brokman
Omer Hofman
Oren Rachmil
Inderjeet Singh
Vikas Pahuja
Rathina Sabapathy Aishvariya Priya
Amit Giloni
Roman Vainshtein
Hisashi Kojima
36
2
0
21 Oct 2024
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Yu Yang
Yuzhou Nie
Zhun Wang
Yuheng Tang
Wenbo Guo
Bo Li
D. Song
ELM
38
6
0
14 Oct 2024
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and
  Large Language Models
Advancing Cyber Incident Timeline Analysis Through Rule Based AI and Large Language Models
Fatma Yasmine Loumachi
Mohamed Chahine Ghanem
AI4CE
40
1
0
04 Sep 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities
LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang
R. Bindu
Akul Gupta
Daniel Kang
SILM
LLMAG
78
55
0
11 Apr 2024
1