CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution

21 May 2025

Abstract

Large Language Model (LLM) agents can automate cybersecurity tasks and can adapt to the evolving cybersecurity landscape without re-engineering. While LLM agents have demonstrated cybersecurity capabilities on Capture-The-Flag (CTF) competitions, they have two key limitations: accessing latest cybersecurity expertise beyond training data, and integrating new knowledge into complex task planning. Knowledge-based approaches that incorporate technical understanding into the task-solving automation can tackle these limitations. We present CRAKEN, a knowledge-based LLM agent framework that improves cybersecurity capability through three core mechanisms: contextual decomposition of task-critical information, iterative self-reflected knowledge retrieval, and knowledge-hint injection that transforms insights into adaptive attack strategies. Comprehensive evaluations with different configurations show CRAKEN's effectiveness in multi-stage vulnerability detection and exploitation compared to previous approaches. Our extensible architecture establishes new methodologies for embedding new security knowledge into LLM-driven cybersecurity agentic systems. With a knowledge database of CTF writeups, CRAKEN obtained an accuracy of 22% on NYU CTF Bench, outperforming prior works by 3% and achieving state-of-the-art results. On evaluation of MITRE ATT&CK techniques, CRAKEN solves 25-30% more techniques than prior work, demonstrating improved cybersecurity capabilities via knowledge-based execution. We make our framework open source to publicthis https URL.

View on arXiv

@article{shao2025_2505.17107,
  title={ CRAKEN: Cybersecurity LLM Agent with Knowledge-Based Execution },
  author={ Minghao Shao and Haoran Xi and Nanda Rani and Meet Udeshi and Venkata Sai Charan Putrevu and Kimberly Milner and Brendan Dolan-Gavitt and Sandeep Kumar Shukla and Prashanth Krishnamurthy and Farshad Khorrami and Ramesh Karri and Muhammad Shafique },
  journal={arXiv preprint arXiv:2505.17107},
  year={ 2025 }
}

Comments on this paper