ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.06039
70
6

Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models

9 February 2025
Marc Bruni
Fabio Gabrielli
Mohammad Ghafari
Martin Kropp
    SILM
ArXiv (abs)PDFHTML
Abstract

Prompt engineering reduces reasoning mistakes in Large Language Models (LLMs). However, its effectiveness in mitigating vulnerabilities in LLM-generated code remains underexplored. To address this gap, we implemented a benchmark to automatically assess the impact of various prompt engineering strategies on code security. Our benchmark leverages two peer-reviewed prompt datasets and employs static scanners to evaluate code security at scale. We tested multiple prompt engineering techniques on GPT-3.5-turbo, GPT-4o, and GPT-4o-mini. Our results show that for GPT-4o and GPT-4o-mini, a security-focused prompt prefix can reduce the occurrence of security vulnerabilities by up to 56%. Additionally, all tested models demonstrated the ability to detect and repair between 41.9% and 68.7% of vulnerabilities in previously generated code when using iterative prompting techniques. Finally, we introduce a "prompt agent" that demonstrates how the most effective techniques can be applied in real-world development workflows.

View on arXiv
@article{bruni2025_2502.06039,
  title={ Benchmarking Prompt Engineering Techniques for Secure Code Generation with GPT Models },
  author={ Marc Bruni and Fabio Gabrielli and Mohammad Ghafari and Martin Kropp },
  journal={arXiv preprint arXiv:2502.06039},
  year={ 2025 }
}
Comments on this paper