ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.06664
  4. Cited By
LLM Agents can Autonomously Hack Websites

LLM Agents can Autonomously Hack Websites

6 February 2024
Richard Fang
R. Bindu
Akul Gupta
Qiusi Zhan
Daniel Kang
    LLMAG
ArXivPDFHTML

Papers citing "LLM Agents can Autonomously Hack Websites"

35 / 35 papers shown
Title
LLMs unlock new paths to monetizing exploits
LLMs unlock new paths to monetizing exploits
Nicholas Carlini
Milad Nasr
Edoardo Debenedetti
Barry Wang
Christopher A. Choquette-Choo
Daphne Ippolito
Florian Tramèr
Matthew Jagielski
AAML
13
0
0
16 May 2025
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
A Survey on the Safety and Security Threats of Computer-Using Agents: JARVIS or Ultron?
Ada Chen
Yongjiang Wu
Jingyang Zhang
Shu Yang
Jen-tse Huang
Kun Wang
Wenxuan Wang
Shuai Wang
ELM
12
0
0
16 May 2025
RedTeamLLM: an Agentic AI framework for offensive security
RedTeamLLM: an Agentic AI framework for offensive security
Brian Challita
Pierre Parrend
LLMAG
50
0
0
11 May 2025
From Texts to Shields: Convergence of Large Language Models and Cybersecurity
From Texts to Shields: Convergence of Large Language Models and Cybersecurity
Tao Li
Ya-Ting Yang
Yunian Pan
Quanyan Zhu
34
0
0
01 May 2025
ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges
ARCeR: an Agentic RAG for the Automated Definition of Cyber Ranges
Matteo Lupinacci
Francesco Blefari
Francesco Romeo
Francesco Aurelio Pironti
Angelo Furfaro
31
0
0
16 Apr 2025
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
CVE-Bench: A Benchmark for AI Agents' Ability to Exploit Real-World Web Application Vulnerabilities
Yuxuan Zhu
Antony Kellermann
Dylan Bowman
Philip Li
Akul Gupta
...
Avi Dhir
Sudhit Rao
Kaicheng Yu
Twm Stone
Daniel Kang
LLMAG
ELM
74
3
0
21 Mar 2025
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
AgentDAM: Privacy Leakage Evaluation for Autonomous Web Agents
Arman Zharmagambetov
Chuan Guo
Ivan Evtimov
Maya Pavlova
Ruslan Salakhutdinov
Kamalika Chaudhuri
67
1
0
12 Mar 2025
AutoAdvExBench: Benchmarking autonomous exploitation of adversarial example defenses
Nicholas Carlini
Javier Rando
Edoardo Debenedetti
Milad Nasr
F. Tramèr
AAML
ELM
44
2
0
03 Mar 2025
Construction and Evaluation of LLM-based agents for Semi-Autonomous penetration testing
Masaya Kobayashi
Masane Fuchi
Amar Zanashir
Tomonori Yoneda
Tomohiro Takagi
LLMAG
47
1
0
24 Feb 2025
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
Towards Automated Penetration Testing: Introducing LLM Benchmark, Analysis, and Improvements
I. Isozaki
Manil Shrestha
Rick Console
Edward Kim
ELM
65
4
0
24 Feb 2025
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
A Frontier AI Risk Management Framework: Bridging the Gap Between Current AI Practices and Established Risk Management
Simeon Campos
Henry Papadatos
Fabien Roger
Chloé Touzet
Malcolm Murray
Otter Quarks
92
2
0
20 Feb 2025
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
OCCULT: Evaluating Large Language Models for Offensive Cyber Operation Capabilities
Michael Kouremetis
Marissa Dotter
Alex Byrne
Dan Martin
Ethan Michalak
Gianpaolo Russo
Michael Threet
Guido Zarrella
ELM
52
7
0
18 Feb 2025
The AI Agent Index
The AI Agent Index
Stephen Casper
Luke Bailey
Rosco Hunter
Carson Ezell
Emma Cabalé
...
Phillip J. K. Christoffersen
A. Pinar Ozisik
Rakshit Trivedi
Dylan Hadfield-Menell
Noam Kolt
80
5
0
03 Feb 2025
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration
  Testing
HackSynth: LLM Agent and Evaluation Framework for Autonomous Penetration Testing
Lajos Muzsai
David Imolai
András Lukács
LLMAG
72
7
0
02 Dec 2024
Evaluating Large Language Models' Capability to Launch Fully Automated
  Spear Phishing Campaigns: Validated on Human Subjects
Evaluating Large Language Models' Capability to Launch Fully Automated Spear Phishing Campaigns: Validated on Human Subjects
Fred Heiding
Simon Lermen
Andrew Kao
B. Schneier
A. Vishwanath
69
8
0
30 Nov 2024
What AI evaluations for preventing catastrophic risks can and cannot do
What AI evaluations for preventing catastrophic risks can and cannot do
Peter Barnett
Lisa Thiergart
ELM
79
2
0
26 Nov 2024
Voice-Enabled AI Agents can Perform Common Scams
Voice-Enabled AI Agents can Perform Common Scams
Richard Fang
Dylan Bowman
Daniel Kang
26
1
0
21 Oct 2024
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks
The Best Defense is a Good Offense: Countering LLM-Powered Cyberattacks
Daniel Ayzenshteyn
Roy Weiss
Yisroel Mirsky
AAML
31
0
0
20 Oct 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM
  Agent Cyber Offense Capabilities
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Andrey Anurin
Jonathan Ng
Kibo Schaffer
Jason Schreiber
Esben Kran
ELM
40
5
0
10 Oct 2024
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Applying Refusal-Vector Ablation to Llama 3.1 70B Agents
Simon Lermen
Mateusz Dziemian
Govind Pimpale
LLMAG
18
4
0
08 Oct 2024
Exploring LLMs for Malware Detection: Review, Framework Design, and
  Countermeasure Approaches
Exploring LLMs for Malware Detection: Review, Framework Design, and Countermeasure Approaches
Jamal N. Al-Karaki
Muhammad Al-Zafar Khan
Marwan Omar
34
6
0
11 Sep 2024
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical
  Researcher
CIPHER: Cybersecurity Intelligent Penetration-testing Helper for Ethical Researcher
Derry Pratama
Naufal Suryanto
Andro Aprila Adiputra
Thi-Thu-Huong Le
Ahmada Yusril Kadiptya
Muhammad Iqbal
Howon Kim
42
7
0
21 Aug 2024
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered
  Applications are Vulnerable to PromptWares
A Jailbroken GenAI Model Can Cause Substantial Harm: GenAI-powered Applications are Vulnerable to PromptWares
Stav Cohen
Ron Bitton
Ben Nassi
SILM
33
5
0
09 Aug 2024
Hierarchical Context Pruning: Optimizing Real-World Code Completion with
  Repository-Level Pretrained Code LLMs
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang
Yunshui Li
Jiaming Li
Xiaobo Xia
Jiaxi Yang
Run Luo
Minzheng Wang
Longze Chen
Junhao Liu
Min Yang
32
1
0
26 Jun 2024
Adversaries Can Misuse Combinations of Safe Models
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
45
7
0
20 Jun 2024
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Teams of LLM Agents can Exploit Zero-Day Vulnerabilities
Richard Fang
Antony Kellermann
Akul Gupta
Qiusi Zhan
Richard Fang
R. Bindu
Daniel Kang
LLMAG
40
29
0
02 Jun 2024
When LLMs Meet Cybersecurity: A Systematic Literature Review
When LLMs Meet Cybersecurity: A Systematic Literature Review
Jie Zhang
Haoyu Bu
Hui Wen
Yu Chen
Lun Li
Hongsong Zhu
42
36
0
06 May 2024
Can LLMs Understand Computer Networks? Towards a Virtual System
  Administrator
Can LLMs Understand Computer Networks? Towards a Virtual System Administrator
Denis Donadel
Francesco Marchiori
Luca Pajola
Mauro Conti
34
7
0
19 Apr 2024
A Safe Harbor for AI Evaluation and Red Teaming
A Safe Harbor for AI Evaluation and Red Teaming
Shayne Longpre
Sayash Kapoor
Kevin Klyman
Ashwin Ramaswami
Rishi Bommasani
...
Daniel Kang
Sandy Pentland
Arvind Narayanan
Percy Liang
Peter Henderson
55
38
0
07 Mar 2024
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Simon Lermen
Charlie Rogers-Smith
Jeffrey Ladish
ALM
20
82
0
31 Oct 2023
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
LLMs as Hackers: Autonomous Linux Privilege Escalation Attacks
A. Happe
Aaron Kaplan
Jürgen Cito
19
16
0
17 Oct 2023
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language
  Feedback
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
130
141
0
19 Sep 2023
Emergent autonomous scientific research capabilities of large language
  models
Emergent autonomous scientific research capabilities of large language models
Daniil A. Boiko
R. MacKnight
Gabe Gomes
ELM
LM&Ro
AI4CE
LLMAG
104
118
0
11 Apr 2023
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
240
2,494
0
06 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
367
8,495
0
28 Jan 2022
1