ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.16594
92
0

AIGCodeSet: A New Annotated Dataset for AI Generated Code Detection

21 December 2024
Basak Demirok
Mucahid Kutlu
    DeLMO
ArXivPDFHTML
Abstract

While large language models provide significant convenience for software development, they can lead to ethical issues in job interviews and student assignments. Therefore, determining whether a piece of code is written by a human or generated by an artificial intelligence (AI) model is a critical issue. In this study, we present AIGCodeSet, which consists of 2.828 AI-generated and 4.755 human-written Python codes, created using CodeLlama 34B, Codestral 22B, and Gemini 1.5 Flash. In addition, we share the results of our experiments conducted with baseline detection methods. Our experiments show that a Bayesian classifier outperforms the other models.

View on arXiv
@article{demirok2025_2412.16594,
  title={ AIGCodeSet: A New Annotated Dataset for AI Generated Code Detection },
  author={ Basak Demirok and Mucahid Kutlu },
  journal={arXiv preprint arXiv:2412.16594},
  year={ 2025 }
}
Comments on this paper