Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.21199
Cited By
HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation
3 January 2025
Zhaojian Yu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HumanEval Pro and MBPP Pro: Evaluating Large Language Models on Self-invoking Code Generation"
4 / 4 papers shown
Title
Activation-Guided Consensus Merging for Large Language Models
Yuxuan Yao
Shuqi Liu
Zehua Liu
Qintong Li
Mingyang Liu
Xiongwei Han
Zhijiang Guo
Han Wu
Linqi Song
MoMe
9
0
0
20 May 2025
AutoGEEval: A Multimodal and Automated Framework for Geospatial Code Generation on GEE with Large Language Models
Shuyang Hou
Zhangxiao Shen
Huayi Wu
Jianyuan Liang
Haoyue Jiao
...
Xiaopu Zhang
Xu Li
Zhipeng Gui
Xuefeng Guan
Longgang Xiang
ELM
2
0
0
19 May 2025
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
31
0
0
08 May 2025
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
Sizhe Wang
Zihan Wang
Dongsheng Ma
Yongan Yu
Rui Ling
Zehan Li
Zhiyu Li
Wenbo Zhang
LRM
65
0
0
30 Apr 2025
1