Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.08604
Cited By
DevBench: A Comprehensive Benchmark for Software Development
13 March 2024
Bowen Li
Wenhan Wu
Ziwei Tang
Lin Shi
John Yang
Jinyang Li
Shunyu Yao
Chao Qian
Binyuan Hui
Qicheng Zhang
Zhiyin Yu
He Du
Ping Yang
Dahua Lin
Chao Peng
Kai Chen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DevBench: A Comprehensive Benchmark for Software Development"
6 / 6 papers shown
Title
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Kai Xu
YiWei Mao
XinYi Guan
ZiLong Feng
43
0
0
12 May 2025
How Efficient is LLM-Generated Code? A Rigorous & High-Standard Benchmark
Ruizhong Qiu
Weiliang Will Zeng
Hanghang Tong
James Ezick
Christopher Lott
88
16
0
20 Feb 2025
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
229
574
0
03 May 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
189
799
0
02 May 2023
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
627
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
201
1,109
0
09 Feb 2021
1