ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.07010
  4. Cited By
ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation
v1v2 (latest)

ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation

10 March 2025
Kaiyuan Liu
Youcheng Pan
Junlin Li
Daojing He
Yang Xiang
Yexing Du
Tianrun Gao
    ELMLLMAG
ArXiv (abs)PDFHTML

Papers citing "ProjectEval: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation"

24 / 24 papers shown
Title
AgileCoder: Dynamic Collaborative Agents for Software Development based
  on Agile Methodology
AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology
Minh Huynh Nguyen
Thang Phan Chau
Phong X. Nguyen
Nghi D. Q. Bui
64
15
0
16 Jun 2024
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Fang Liu
Yang Liu
Lin Shi
Houkun Huang
Ruifeng Wang
Zhen Yang
Li Zhang
Zhongqi Li
Yuchi Ma
82
119
0
01 Apr 2024
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with
  Real-World Code Repositories
EvoCodeBench: An Evolving Code Generation Benchmark Aligned with Real-World Code Repositories
Jia Li
Ge Li
Xuanming Zhang
Yihong Dong
Zhi Jin
74
45
0
31 Mar 2024
CodeBenchGen: Creating Scalable Execution-based Code Generation
  Benchmarks
CodeBenchGen: Creating Scalable Execution-based Code Generation Benchmarks
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
80
11
0
31 Mar 2024
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
CodeS: Natural Language to Code Repository via Multi-Layer Sketch
Daoguang Zan
Ailun Yu
Wei Liu
Dong Chen
Bo Shen
...
Bei Guan
Zhiguang Yang
Yongji Wang
Qianxiang Wang
Li-zhen Cui
69
15
0
25 Mar 2024
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual
  Natural Language Generalization
HumanEval-XL: A Multilingual Code Generation Benchmark for Cross-lingual Natural Language Generalization
Qiwei Peng
Yekun Chai
Xuhong Li
ELMLM&MA
71
42
0
26 Feb 2024
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
Dong Huang
Yuhao Qing
Weiyi Shang
Heming Cui
Jie M. Zhang
111
37
0
03 Feb 2024
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
CRUXEval: A Benchmark for Code Reasoning, Understanding and Execution
Alex Gu
Baptiste Rozière
Hugh Leather
Armando Solar-Lezama
Gabriel Synnaeve
Sida I. Wang
ELMALMLRM
49
115
0
05 Jan 2024
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
103
591
0
10 Oct 2023
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on
  Class-level Code Generation
ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation
Xueying Du
Wentai Deng
Kaixin Wang
Hanlin Wang
Junwei Liu
Yixuan Chen
Jiayi Feng
Chaofeng Sha
Xin Peng
Xin Peng
ELMALM
60
148
0
03 Aug 2023
ChatDev: Communicative Agents for Software Development
ChatDev: Communicative Agents for Software Development
Cheng Qian
Wei Liu
Hongzhang Liu
Nuo Chen
Yufan Dang
...
Xin Cong
Juyuan Xu
Dahai Li
Zhiyuan Liu
Maosong Sun
LLMAG
78
211
0
16 Jul 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELMALM
248
921
0
02 May 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALMPILM
1.5K
13,247
0
27 Feb 2023
ReCode: Robustness Evaluation of Code Generation Models
ReCode: Robustness Evaluation of Code Generation Models
Shiqi Wang
Zheng Li
Haifeng Qian
Cheng Yang
Zijian Wang
...
Parminder Bhatia
Ramesh Nallapati
M. K. Ramanathan
Dan Roth
Bing Xiang
55
88
0
20 Dec 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code
  Generation
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai
Chengxi Li
Yiming Wang
Tianyi Zhang
Ruiqi Zhong
Luke Zettlemoyer
Scott Yih
Daniel Fried
Si-yi Wang
Tao Yu
ELMALM
91
334
0
18 Nov 2022
Multi-lingual Evaluation of Code Generation Models
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun
Sanjay Krishna Gouda
Zijian Wang
Xiaopeng Li
Yuchen Tian
...
Baishakhi Ray
Parminder Bhatia
Sudipta Sengupta
Dan Roth
Bing Xiang
ELM
153
172
0
26 Oct 2022
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELMAIMatReCodALM
200
1,986
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELMALM
233
5,539
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
Basel Alomair
Jacob Steinhardt
ELMAIMatALM
254
681
0
20 May 2021
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren
Daya Guo
Shuai Lu
Long Zhou
Shujie Liu
Duyu Tang
Neel Sundaresan
M. Zhou
Ambrosio Blanco
Shuai Ma
ELM
112
534
0
22 Sep 2020
Making Monolingual Sentence Embeddings Multilingual using Knowledge
  Distillation
Making Monolingual Sentence Embeddings Multilingual using Knowledge Distillation
Nils Reimers
Iryna Gurevych
104
1,025
0
21 Apr 2020
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
CodeSearchNet Challenge: Evaluating the State of Semantic Code Search
Hamel Husain
Hongqiu Wu
Tiferet Gazit
Miltiadis Allamanis
Marc Brockschmidt
ELM
130
1,079
0
20 Sep 2019
Mapping Language to Code in Programmatic Context
Mapping Language to Code in Programmatic Context
Srinivasan Iyer
Ioannis Konstas
Alvin Cheung
Luke Zettlemoyer
69
238
0
29 Aug 2018
Learning to Mine Aligned Code and Natural Language Pairs from Stack
  Overflow
Learning to Mine Aligned Code and Natural Language Pairs from Stack Overflow
Pengcheng Yin
Bowen Deng
Edgar Chen
Bogdan Vasilescu
Graham Neubig
63
304
0
23 May 2018
1