Getting pwn'd by AI: Penetration Testing with Large Language Models

24 July 2023

Papers citing "Getting pwn'd by AI: Penetration Testing with Large Language Models"

43 / 43 papers shown

Title
LLMs unlock new paths to monetizing exploits Nicholas Carlini Milad Nasr Edoardo Debenedetti Barry Wang Christopher A. Choquette-Choo Daphne Ippolito Florian Tramèr Matthew Jagielski AAML 23 0 0 16 May 2025
Weaponizing Language Models for Cybersecurity Offensive Operations: Automating Vulnerability Assessment Report Validation; A Review Paper Abdulrahman S Almuhaidib Azlan Mohd Zain Zalmiyah Zakaria Izyan Izzati Kamsani Abdulaziz S Almuhaidib 53 0 0 07 May 2025
Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey Shuang Tian Tao Zhang Qingbin Liu Jiacheng Wang Xuangou Wu ... Ruichen Zhang Feiyu Xiong Zhenhui Yuan Shiwen Mao Dong In Kim 62 0 0 22 Apr 2025
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design A. Happe Jürgen Cito 27 0 0 14 Apr 2025
Leveraging Machine Learning Techniques in Intrusion Detection Systems for Internet of Things Saeid Jamshidi Amin Nikanjam Nafi Kawser Wazed Foutse Khomh 34 0 0 09 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape Wenbo Guo Yujin Potter Tianneng Shi Zhun Wang Andy Zhang Dawn Song 60 2 0 07 Apr 2025
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses Nicholas Carlini Javier Rando Edoardo Debenedetti Milad Nasr F. Tramèr AAML ELM 52 2 0 03 Mar 2025
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing Masaya Kobayashi Masane Fuchi Amar Zanashir Tomonori Yoneda Tomohiro Takagi LLMAG 55 2 0 24 Feb 2025
RapidPen: Fully Automated IP-to-Shell Penetration Testing with LLM-based Agents Sho Nakatani 63 2 0 23 Feb 2025
Do LLMs Consider Security? An Empirical Study on Responses to Programming Questions Amirali Sajadi Binh Le A. Nguyen Kostadin Damevski Preetha Chatterjee 66 2 0 20 Feb 2025
Generative AI for Internet of Things Security: Challenges and Opportunities Yan Lin Aung Ivan Christian Ye Dong Xiaodong Ye Sudipta Chattopadhyay Jianying Zhou 61 1 0 13 Feb 2025
Generative Artificial Intelligence-Supported Pentesting: A Comparison between Claude Opus, GPT-4, and Copilot Antonio López Martínez Alejandro Cano Antonio Ruiz-Martínez ELM 52 2 0 12 Jan 2025
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing Lajos Muzsai David Imolai András Lukács LLMAG 79 9 0 02 Dec 2024
Hacking Back the AI-Hacker: Prompt Injection as a Defense Against LLM-driven Cyberattacks Dario Pasquini Evgenios M. Kornaropoulos G. Ateniese AAML 32 4 0 28 Oct 2024
AutoPenBench: Benchmarking Generative Agents for Penetration Testing Luca Gioacchini Marco Mellia Idilio Drago Alexander Delsanto G. Siracusano Roberto Bifulco ELM 42 6 0 04 Oct 2024
Hackphyr: A Local Fine-Tuned LLM Agent for Network Security Environments M. Rigaki C. Catania Sebastian Garcia LLMAG 42 4 0 17 Sep 2024
Hacking, The Lazy Way: LLM Augmented Pentesting Dhruva Goyal Sitaraman Subramanian Aditya Peela Nisha P. Shetty 41 7 0 14 Sep 2024
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher Derry Pratama Naufal Suryanto Andro Aprila Adiputra Thi-Thu-Huong Le Ahmada Yusril Kadiptya Muhammad Iqbal Howon Kim 50 8 0 21 Aug 2024
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares Stav Cohen Ron Bitton Ben Nassi SILM 41 5 0 09 Aug 2024
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future Haolin Jin Linghan Huang Haipeng Cai Jun Yan Bo Li Huaming Chen 78 30 0 05 Aug 2024
PenHeal: A Two-Stage LLM Framework for Automated Pentesting and Optimal Remediation Junjie Huang Quanyan Zhu 35 18 0 25 Jul 2024
From Sands to Mansions: Towards Automated Cyberattack Emulation with Classical Planning and Large Language Models Lingzhi Wang Zhenyuan Li Zonghan Guo Yi Jiang Kyle Jung Kedar Thiagarajan Jiahui Wang Zhengkai Wang Emily Wei Xiangmin Shen 78 0 0 24 Jul 2024
MoRSE: Bridging the Gap in Cybersecurity Expertise with Retrieval Augmented Generation Marco Simoni Andrea Saracino Vinod Puthuvath Maurco Conti 60 2 0 22 Jul 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities Richard Fang Antony Kellermann Akul Gupta Qiusi Zhan Richard Fang R. Bindu Daniel Kang LLMAG 42 30 0 02 Jun 2024
A Comprehensive Overview of Large Language Models (LLMs) for Cyber Defences: Opportunities and Directions Mohammed Hassanin Nour Moustafa 40 27 0 23 May 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review Jie Zhang Haoyu Bu Hui Wen Yu Chen Lun Li Hongsong Zhu 47 36 0 06 May 2024
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator Denis Donadel Francesco Marchiori Luca Pajola Mauro Conti 36 7 0 19 Apr 2024
LLM Agents can Autonomously Exploit One-day Vulnerabilities Richard Fang R. Bindu Akul Gupta Daniel Kang SILM LLMAG 81 56 0 11 Apr 2024
Just another copy and paste? Comparing the security vulnerabilities of ChatGPT generated code and StackOverflow answers Sivana Hamer Marcelo dÁmorim Laurie A. Williams SILM ELM 40 18 0 22 Mar 2024
Review of Generative AI Methods in Cybersecurity Yagmur Yigit William J. Buchanan Madjid G Tehrani Leandros A. Maglaras AAML 56 20 0 13 Mar 2024
AutoAttacker: A Large Language Model Guided System to Implement Automatic Cyber-attacks Jiacen Xu Jack W. Stokes Geoff McDonald Xuesong Bai David Marshall Siyue Wang Adith Swaminathan Zhou Li 56 51 0 02 Mar 2024
A Survey of Large Language Models in Cybersecurity Gabriel de Jesus Coelho da Silva Carlos Becker Westphall 37 6 0 26 Feb 2024
A Preliminary Study on Using Large Language Models in Software Pentesting Kumar Shashwat Francis Hahn Xinming Ou Dmitry Goldgof Lawrence Hall Jay Ligatti S. R. Rajgopalan Armin Ziaie Tabari LLMAG 33 5 0 30 Jan 2024
Large Language Models in Cybersecurity: State-of-the-Art Farzad Nourmohammadzadeh Motlagh Mehrdad Hajizadeh Mehryar Majd Pejman Najafi Feng Cheng Christoph Meinel ELM 52 43 0 30 Jan 2024
A Survey on Large Language Model (LLM) Security and Privacy: The Good, the Bad, and the Ugly Yifan Yao Jinhao Duan Kaidi Xu Yuanfang Cai Eric Sun Yue Zhang PILM ELM 57 478 0 04 Dec 2023
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks A. Happe Aaron Kaplan Jürgen Cito 40 16 0 17 Oct 2023
LLM for SoC Security: A Paradigm Shift Dipayan Saha Shams Tarek Katayoon Yahyaei S. Saha Jingbo Zhou M. Tehranipoor Farimah Farahmandi 69 48 0 09 Oct 2023
Cyber Sentinel: Exploring Conversational Agents in Streamlining Security Tasks with GPT-4 Mehrdad Kaheh Danial Khosh Kholgh Panos Kostakos 51 13 0 28 Sep 2023
Out of the Cage: How Stochastic Parrots Win in Cyber Security Environments M. Rigaki Ondrej Lukás C. Catania Sebastian Garcia LLMAG 25 11 0 23 Aug 2023
Large Language Models for Software Engineering: A Systematic Literature Review Xinying Hou Yanjie Zhao Yue Liu Zhou Yang Kailong Wang Li Li Xiapu Luo David Lo John C. Grundy Haoyu Wang 39 332 0 21 Aug 2023
Understanding Hackers' Work: An Empirical Study of Offensive Security Practitioners A. Happe Jürgen Cito 19 10 0 14 Aug 2023
Generative Agents: Interactive Simulacra of Human Behavior J. Park Joseph C. O'Brien Carrie J. Cai Meredith Ringel Morris Percy Liang Michael S. Bernstein LM&Ro AI4CE 244 1,772 0 07 Apr 2023
Learning to Prompt for Vision-Language Models Kaiyang Zhou Jingkang Yang Chen Change Loy Ziwei Liu VPVLM CLIP VLM 350 2,286 0 02 Sep 2021