ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.19856
  4. Cited By
DevEval: A Manually-Annotated Code Generation Benchmark Aligned with
  Real-World Code Repositories

DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories

30 May 2024
Jia Li
Ge Li
Yunfei Zhao
Yongming Li
Huanyu Liu
Hao Zhu
Lecheng Wang
Kaibo Liu
Zheng Fang
Lanshen Wang
Jiazheng Ding
Xuanming Zhang
Yuqi Zhu
Yihong Dong
Zhi Jin
Binhua Li
Fei Huang
Yongbin Li
    ALM
ArXivPDFHTML

Papers citing "DevEval: A Manually-Annotated Code Generation Benchmark Aligned with Real-World Code Repositories"

19 / 19 papers shown
Title
Rethinking Repetition Problems of LLMs in Code Generation
Rethinking Repetition Problems of LLMs in Code Generation
Yihong Dong
Yuchen Liu
Xue Jiang
Zhi Jin
Ge Li
24
0
0
15 May 2025
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
CRUST-Bench: A Comprehensive Benchmark for C-to-safe-Rust Transpilation
Anirudh Khatry
Robert Zhang
Jia Pan
Ziteng Wang
Qiaochu Chen
Greg Durrett
Isil Dillig
34
0
0
21 Apr 2025
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers
SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers
Yanzheng Xiang
Hanqi Yan
Shuyin Ouyang
Lin Gui
Yulan He
51
2
0
31 Mar 2025
FEA-Bench: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation
Wei Li
Xin Zhang
Zhongxin Guo
Shaoguang Mao
Wen Luo
Guangyue Peng
Yangyu Huang
Houfeng Wang
Scarlett Li
57
0
0
09 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
CodeIF-Bench: Evaluating Instruction-Following Capabilities of Large Language Models in Interactive Code Generation
Peiding Wang
L. Zhang
Fang Liu
Lin Shi
Minxiao Li
Bo Shen
An Fu
ELM
LRM
151
0
0
05 Mar 2025
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval
Isolating Language-Coding from Problem-Solving: Benchmarking LLMs with PseudoEval
Jiarong Wu
Songqiang Chen
Jialun Cao
Hau Ching Lo
Shing-Chi Cheung
51
0
0
26 Feb 2025
CodeSwift: Accelerating LLM Inference for Efficient Code Generation
CodeSwift: Accelerating LLM Inference for Efficient Code Generation
Qianhui Zhao
L. Zhang
Fang Liu
Xiaoli Lian
Qiaoyuanhe Meng
Ziqian Jiao
Zetong Zhou
Borui Zhang
Runlin Guo
Jia Li
41
0
0
24 Feb 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
Shing-Chi Cheung
ALM
71
1
0
18 Jan 2025
Seeker: Towards Exception Safety Code Generation with Intermediate
  Language Agents Framework
Seeker: Towards Exception Safety Code Generation with Intermediate Language Agents Framework
Xuanming Zhang
Yuxuan Chen
Yiming Zheng
Zhexin Zhang
Yuan Yuan
Minlie Huang
LLMAG
73
1
0
16 Dec 2024
EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific
  Evaluations
EvoCodeBench: An Evolving Code Generation Benchmark with Domain-Specific Evaluations
Jia Li
Ge Li
Xuanming Zhang
Yunfei Zhao
Yihong Dong
Zhi Jin
Binhua Li
Fei Huang
Yongbin Li
ALM
ELM
44
11
0
30 Oct 2024
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent
  Approach
Seeker: Enhancing Exception Handling in Code with LLM-based Multi-Agent Approach
Xuanming Zhang
Yuxuan Chen
Yuan Yuan
Minlie Huang
LLMAG
32
1
0
09 Oct 2024
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?
Codev-Bench: How Do LLMs Understand Developer-Centric Code Completion?
Zhenyu Pan
Rongyu Cao
Yongchang Cao
Yingwei Ma
Binhua Li
Fei Huang
Han Liu
Yongbin Li
45
4
0
02 Oct 2024
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
  at Scale
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Huy N. Phan
Phong X. Nguyen
Nghi D. Q. Bui
LLMAG
33
11
0
09 Sep 2024
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code
  Generation
DOMAINEVAL: An Auto-Constructed Benchmark for Multi-Domain Code Generation
Qiming Zhu
Jialun Cao
Yaojie Lu
Hongyu Lin
Xianpei Han
Le Sun
Shing-Chi Cheung
ALM
35
7
0
23 Aug 2024
Benchmarks and Metrics for Evaluations of Code Generation: A Critical
  Review
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
ALM
ELM
34
9
0
18 Jun 2024
Large Language Model-Aware In-Context Learning for Code Generation
Large Language Model-Aware In-Context Learning for Code Generation
Jia Li
Ge Li
Chongyang Tao
Jia Li
Huangzhao Zhang
Fang Liu
Zhi Jin
51
28
0
15 Oct 2023
Structured Chain-of-Thought Prompting for Code Generation
Structured Chain-of-Thought Prompting for Code Generation
Jia Li
Ge Li
Yongming Li
Zhi Jin
LRM
43
112
0
11 May 2023
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
624
0
20 May 2021
1