Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.19502
Cited By
CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation
26 May 2025
Guang Yang
Yu Zhou
Xiang Chen
Wei-Shi Zheng
Xing Hu
Xin Zhou
David Lo
Taolue Chen
ALM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CODE-DITING: A Reasoning-Based Metric for Functional Alignment in Code Evaluation"
23 / 23 papers shown
Title
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALM
OffRL
118
18
0
04 Mar 2025
Preference Leakage: A Contamination Problem in LLM-as-a-judge
Dawei Li
Renliang Sun
Yue Huang
Ming Zhong
Bohan Jiang
Jiawei Han
Wei Wei
Wei Wang
Huan Liu
157
30
0
03 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
380
2,013
0
22 Jan 2025
From Generation to Judgment: Opportunities and Challenges of LLM-as-a-judge
Dawei Li
Bohan Jiang
Liangjie Huang
Alimohammad Beigi
Chengshuai Zhao
...
Canyu Chen
Tianhao Wu
Kai Shu
Lu Cheng
Huan Liu
ELM
AILaw
321
112
0
25 Nov 2024
OpenCoder: The Open Cookbook for Top-Tier Code Large Language Models
Siming Huang
Tianhao Cheng
J.K. Liu
Jiaran Hao
L. Song
...
Ge Zhang
Zili Wang
Yuan Qi
Yinghui Xu
Wei Chu
ALM
181
31
0
07 Nov 2024
CodeJudge: Evaluating Code Generation with Large Language Models
Weixi Tong
Tianyi Zhang
ELM
ALM
53
16
0
03 Oct 2024
Qwen2.5-Coder Technical Report
Binyuan Hui
Jian Yang
Zeyu Cui
Jiaxi Yang
Dayiheng Liu
...
Fei Huang
Xingzhang Ren
Xuancheng Ren
Jingren Zhou
Junyang Lin
OSLM
117
337
0
18 Sep 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
206
193
0
22 Jun 2024
Benchmarks and Metrics for Evaluations of Code Generation: A Critical Review
Debalina Ghosh Paul
Hong Zhu
Ian Bayley
ALM
ELM
78
12
0
18 Jun 2024
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
92
3
0
26 Apr 2024
PiSSA: Principal Singular Values and Singular Vectors Adaptation of Large Language Models
Fanxu Meng
Zhaohui Wang
Muhan Zhang
VLM
153
103
0
03 Apr 2024
DeepSeek-Coder: When the Large Language Model Meets Programming -- The Rise of Code Intelligence
Daya Guo
Qihao Zhu
Dejian Yang
Zhenda Xie
Kai Dong
...
Yu-Huan Wu
Yiming Li
Fuli Luo
Yingfei Xiong
W. Liang
ELM
128
804
0
25 Jan 2024
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
199
2,330
0
12 Sep 2023
Large Language Models for Software Engineering: A Systematic Literature Review
Xinying Hou
Yanjie Zhao
Yue Liu
Zhou Yang
Kailong Wang
Li Li
Xiapu Luo
David Lo
John C. Grundy
Haoyu Wang
80
424
0
21 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
469
4,444
0
09 Jun 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
263
958
0
02 May 2023
ICE-Score: Instructing Large Language Models to Evaluate Code
Terry Yue Zhuo
ELM
ALM
97
45
0
27 Apr 2023
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code
Shuyan Zhou
Uri Alon
Sumit Agarwal
Graham Neubig
ELM
ALM
99
114
0
10 Feb 2023
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators
Pietro Liguori
Cristina Improta
R. Natella
B. Cukic
Domenico Cotroneo
ELM
111
17
0
12 Dec 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
859
9,714
0
28 Jan 2022
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
238
5,675
0
07 Jul 2021
CodeBLEU: a Method for Automatic Evaluation of Code Synthesis
Shuo Ren
Daya Guo
Shuai Lu
Long Zhou
Shujie Liu
Duyu Tang
Neel Sundaresan
M. Zhou
Ambrosio Blanco
Shuai Ma
ELM
128
543
0
22 Sep 2020
Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
VLM
355
18,661
0
06 Feb 2015
1