Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.10112
Cited By
v1
v2 (latest)
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design
14 April 2025
A. Happe
Jürgen Cito
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design"
22 / 22 papers shown
Title
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
I. Isozaki
Manil Shrestha
Rick Console
Edward Kim
ELM
102
7
0
24 Feb 2025
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents
Sho Nakatani
117
3
1
23 Feb 2025
Can LLMs Hack Enterprise Networks? Autonomous Assumed Breach Penetration-Testing Active Directory Networks
A. Happe
Jürgen Cito
96
4
0
06 Feb 2025
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
Lajos Muzsai
David Imolai
András Lukács
LLMAG
113
12
0
02 Dec 2024
AutoPT: How Far Are We from the End2End Automated Web Penetration Testing?
Benlong Wu
Guoqiang Chen
Kejiang Chen
Xiuwei Shang
Jiapeng Han
Yanru He
Weinan Zhang
Nenghai Yu
LLMAG
69
5
0
02 Nov 2024
AutoPenBench: Benchmarking Generative Agents for Penetration Testing
Luca Gioacchini
Marco Mellia
Idilio Drago
Alexander Delsanto
G. Siracusano
Roberto Bifulco
ELM
65
6
0
04 Oct 2024
CYBERSECEVAL 3: Advancing the Evaluation of Cybersecurity Risks and Capabilities in Large Language Models
Shengye Wan
Cyrus Nikolaidis
Daniel Song
David Molnar
James Crnkovich
...
Spencer Whitman
Stephanie Ding
Vlad Ionescu
Yue Li
Joshua Saxe
ELM
86
22
0
02 Aug 2024
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation
Junjie Huang
Quanyan Zhu
61
21
0
25 Jul 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
LLMAG
86
35
0
02 Jun 2024
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions
Mohammed Hassanin
Nour Moustafa
75
31
0
23 May 2024
Large Language Models for Cyber Security: A Systematic Literature Review
HanXiang Xu
Shenao Wang
Ningke Li
Kaidi Wang
Yanjie Zhao
Kai Chen
Ting Yu
Yang Liu
Haoyu Wang
111
41
0
08 May 2024
CyberSecEval 2: A Wide-Ranging Cybersecurity Evaluation Suite for Large Language Models
Manish P Bhatt
Sahana Chennabasappa
Yue Li
Cyrus Nikolaidis
Daniel Song
...
Yaohui Chen
Dhaval Kapil
David Molnar
Spencer Whitman
Joshua Saxe
ELM
89
41
0
19 Apr 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities
Richard Fang
R. Bindu
Akul Gupta
Daniel Kang
SILM
LLMAG
125
66
0
11 Apr 2024
Review of Generative AI Methods in Cybersecurity
Yagmur Yigit
William J. Buchanan
Madjid G Tehrani
Leandros A. Maglaras
AAML
129
23
0
13 Mar 2024
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks
Jiacen Xu
Jack W. Stokes
Geoff McDonald
Xuesong Bai
David Marshall
Siyue Wang
Adith Swaminathan
Zhou Li
90
58
0
02 Mar 2024
An Empirical Evaluation of LLMs for Solving Offensive Security Challenges
Minghao Shao
Boyuan Chen
Sofija Jancheska
Brendan Dolan-Gavitt
Siddharth Garg
Ramesh Karri
Mohamed Bennai
79
30
0
19 Feb 2024
Purple Llama CyberSecEval: A Secure Coding Benchmark for Language Models
Manish P Bhatt
Sahana Chennabasappa
Cyrus Nikolaidis
Shengye Wan
Ivan Evtimov
...
Aleksandar Straumann
Gabriel Synnaeve
Varun Vontimitta
Spencer Whitman
Joshua Saxe
ELM
95
80
0
07 Dec 2023
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly
Yifan Yao
Jinhao Duan
Kaidi Xu
Yuanfang Cai
Eric Sun
Yue Zhang
PILM
ELM
97
545
0
04 Dec 2023
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
A. Happe
Aaron Kaplan
Jürgen Cito
73
17
0
17 Oct 2023
Understanding Hackers' Work: An Empirical Study of Offensive Security Practitioners
A. Happe
Jürgen Cito
49
11
0
14 Aug 2023
Getting pwn'd by AI: Penetration Testing with Large Language Models
A. Happe
Jürgen Cito
68
83
0
24 Jul 2023
Large Language Models
Michael R Douglas
LLMAG
LM&MA
138
642
0
11 Jul 2023
1