Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2309.01940
Cited By
CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models
5 September 2023
Lingyue Fu
Huacan Chai
Shuang Luo
Kounianhua Du
Weiming Zhang
Longteng Fan
Jiayi Lei
Renting Rui
Jianghao Lin
Yuchen Fang
Yifan Liu
Jingkuan Wang
Siyuan Qi
Kangning Zhang
Weinan Zhang
Yong Yu
ELM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"CodeApex: A Bilingual Programming Evaluation Benchmark for Large Language Models"
9 / 9 papers shown
Title
Polymath: A Challenging Multi-modal Mathematical Reasoning Benchmark
Himanshu Gupta
Shreyas Verma
Ujjwala Anantheswaran
Kevin Scaria
Mihir Parmar
Swaroop Mishra
Chitta Baral
ReLM
LRM
32
5
0
06 Oct 2024
VersiCode: Towards Version-controllable Code Generation
Tongtong Wu
Weigang Wu
Xingyu Wang
Kang Xu
Suyu Ma
Bo Jiang
Ping Yang
Zhenchang Xing
Yuan-Fang Li
Gholamreza Haffari
45
4
0
11 Jun 2024
CyberMetric: A Benchmark Dataset based on Retrieval-Augmented Generation for Evaluating LLMs in Cybersecurity Knowledge
Norbert Tihanyi
M. Ferrag
Ridhi Jain
Tamás Bisztray
Merouane Debbah
ELM
41
22
0
12 Feb 2024
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation
Yifei Xu
Yuning Chen
Xumiao Zhang
Xianshang Lin
Pan Hu†
...
Songwu Lu
Wan Du
Z. Mao
Ennan Zhai
Dennis Cai
ALM
40
9
0
10 Nov 2023
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRM
MLLM
43
509
0
03 Oct 2023
LawBench: Benchmarking Legal Knowledge of Large Language Models
Zhiwei Fei
Xiaoyu Shen
D. Zhu
Fengzhe Zhou
Zhuo Han
Songyang Zhang
Kai-xiang Chen
Zongwen Shen
Jidong Ge
ELM
AILaw
36
36
0
28 Sep 2023
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
253
1,073
0
05 Oct 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Chenyu You
Guosheng Lin
246
1,492
0
02 Sep 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
627
0
20 May 2021
1