LLM Agents can Autonomously Exploit One-day Vulnerabilities

v1v2 (latest)

LLM Agents can Autonomously Exploit One-day Vulnerabilities

11 April 2024

ArXiv (abs)PDF HTML

Papers citing "LLM Agents can Autonomously Exploit One-day Vulnerabilities"

19 / 19 papers shown

Title
Hierarchical Error Assessment of CAD Models for Aircraft Manufacturing-and-Measurement Jin Huang Honghua Chen Mingqiang Wei 107 0 0 12 Jun 2025
VideoMat: Extracting PBR Materials from Video Diffusion Models Jacob Munkberg Zian Wang Ruofan Liang Tianchang Shen J. Hasselgren DiffM VGen 109 6 0 11 Jun 2025
Improving LLM Agents with Reinforcement Learning on Cryptographic CTF Challenges Lajos Muzsai David Imolai András Lukács LLMAG LRM 18 0 0 01 Jun 2025
DefenderBench: A Toolkit for Evaluating Language Agents in Cybersecurity Environments Chiyu Zhang Marc-Alexandre Cote Michael Albada Anush Sankaran Jack W. Stokes Tong Wang Amir H. Abdi William Blum Muhammad Abdul-Mageed LLMAG AAML ELM 51 0 0 31 May 2025
Dynamic Risk Assessments for Offensive Cybersecurity Agents Boyi Wei Benedikt Stroebl Jiacen Xu Joie Zhang Zhou Li Peter Henderson 82 0 0 23 May 2025
RedTeamLLM: an Agentic AI framework for offensive security Brian Challita Pierre Parrend LLMAG 133 0 0 11 May 2025
Benchmarking Practices in LLM-driven Offensive Security: Testbeds, Metrics, and Experiment Design A. Happe Jürgen Cito 88 1 0 14 Apr 2025
Frontier AI's Impact on the Cybersecurity Landscape Wenbo Guo Yujin Potter Tianneng Shi Zhun Wang Andy Zhang Dawn Song 111 2 0 07 Apr 2025
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities Yuxuan Zhu Antony Kellermann Dylan Bowman Philip Li Akul Gupta ... Avi Dhir Sudhit Rao Kaicheng Yu Twm Stone Daniel Kang LLMAG ELM 128 7 0 21 Mar 2025
Mapping AI Benchmark Data to Quantitative Risk Estimates Through Expert Elicitation Malcolm Murray Henry Papadatos Otter Quarks Pierre-François Gimenez Simeon Campos 115 1 0 06 Mar 2025
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements I. Isozaki Manil Shrestha Rick Console Edward Kim ELM 120 7 0 24 Feb 2025
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing Masaya Kobayashi Masane Fuchi Amar Zanashir Tomonori Yoneda Tomohiro Takagi LLMAG 122 2 0 24 Feb 2025
Generative AI for Internet of Things Security: Challenges and Opportunities Yan Lin Aung Ivan Christian Ye Dong Xiaodong Ye Sudipta Chattopadhyay Jianying Zhou 100 1 0 13 Feb 2025
Benchmarking and Defending Against Indirect Prompt Injection Attacks on Large Language Models Jingwei Yi Yueqi Xie Bin Zhu Emre Kiciman Guangzhong Sun Xing Xie Fangzhao Wu AAML 174 82 0 28 Jan 2025
From LLMs to LLM-based Agents for Software Engineering: A Survey of Current, Challenges and Future Haolin Jin Linghan Huang Haipeng Cai Jun Yan Bo Li Huaming Chen 158 37 0 05 Aug 2024
On the Limitations of Compute Thresholds as a Governance Strategy Sara Hooker 122 19 0 08 Jul 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities Richard Fang Antony Kellermann Akul Gupta Qiusi Zhan Richard Fang R. Bindu Daniel Kang LLMAG 103 36 0 02 Jun 2024
Societal Adaptation to Advanced AI Jamie Bernardi Gabriel Mukobi Hilary Greaves Lennart Heim Markus Anderljung 111 7 0 16 May 2024
TDAG: A Multi-Agent Framework based on Dynamic Task Decomposition and Agent Generation Yaoxiang Wang Zhiyong Wu Junfeng Yao Jinsong Su LLMAG 151 12 0 15 Feb 2024