Text Prompt Injection of Vision Language Models

10 October 2025

Ruizhe Zhu

SILM

VLM

ArXiv (abs)PDF HTML Github

Main:4 Pages

2 Figures

Bibliography:3 Pages

5 Tables

Appendix:2 Pages

Abstract

The widespread application of large vision language models has significantly raised safety concerns. In this project, we investigate text prompt injection, a simple yet effective method to mislead these models. We developed an algorithm for this type of attack and demonstrated its effectiveness and efficiency through experiments. Compared to other attack methods, our approach is particularly effective for large models without high demand for computational resources.

View on arXiv

Comments on this paper