95

Detecting Prompt Injection Attacks Against Application Using Classifiers

Safwan Shaheer
G. M. Refatul Islam
Mohammad Rafid Hamid
Md. Abrar Faiaz Khan
Md. Omar Faruk
Yaseen Nur
Main:7 Pages
13 Figures
Bibliography:1 Pages
4 Tables
Abstract

Prompt injection attacks can compromise the security and stability of critical systems, from infrastructure to large web applications. This work curates and augments a prompt injection dataset based on the HackAPrompt Playground Submissions corpus and trains several classifiers, including LSTM, feed forward neural networks, Random Forest, and Naive Bayes, to detect malicious prompts in LLM integrated web applications. The proposed approach improves prompt injection detection and mitigation, helping protect targeted applications and systems.

View on arXiv
Comments on this paper