Out of the BLEU: how should we assess quality of the Code Generation
models?

Out of the BLEU: how should we assess quality of the Code Generation models?

5 August 2022

Mikhail Evtikhiev

Yaroslav Sokolov

Papers citing "Out of the BLEU: how should we assess quality of the Code Generation models?"

12 / 12 papers shown

Title
BLEUBERI: BLEU is a surprisingly effective reward for instruction following Yapei Chang Yekyung Kim Michael Krumdick Amir Zadeh Chuan Li Chris Tanner Mohit Iyyer ALM 22 0 0 16 May 2025
Bridging LLM-Generated Code and Requirements: Reverse Generation technique and SBC Metric for Developer Insights Ahilan Ayyachamy Nadar Ponnusamy 66 0 0 11 Feb 2025
SyntheT2C: Generating Synthetic Data for Fine-Tuning Large Language Models on the Text2Cypher Task Ziije Zhong Linqing Zhong Zhaoze Sun Qingyun Jin Zengchang Qin Xiaofan Zhang 63 7 0 28 Jan 2025
CRScore: Grounding Automated Evaluation of Code Review Comments in Code Claims and Smells Atharva Naik Marcus Alenius Daniel Fried Carolyn Rose 42 0 0 29 Sep 2024
Retrieval-augmented code completion for local projects using large language models Marko Hostnik Marko Robnik-Sikonja RALM 35 0 0 09 Aug 2024
Automating the Correctness Assessment of AI-generated Code for Security Contexts Domenico Cotroneo Alessio Foggia Cristina Improta Pietro Liguori R. Natella 31 8 0 28 Oct 2023
Bias Testing and Mitigation in LLM-based Code Generation Dong Huang Qingwen Bu Jie M. Zhang Xiaofei Xie Junjie Chen Heming Cui 48 20 0 03 Sep 2023
CodeBERTScore: Evaluating Code Generation with Pretrained Models of Code Shuyan Zhou Uri Alon Sumit Agarwal Graham Neubig ELM ALM 40 99 0 10 Feb 2023
Who Evaluates the Evaluators? On Automatic Metrics for Assessing AI-based Offensive Code Generators Pietro Liguori Cristina Improta R. Natella B. Cukic Domenico Cotroneo ELM 36 16 0 12 Dec 2022
Reading Between the Lines: Modeling User Behavior and Costs in AI-Assisted Programming Hussein Mozannar Gagan Bansal Adam Fourney Eric Horvitz 49 109 0 25 Oct 2022
Don't Complete It! Preventing Unhelpful Code Completion for Productive and Sustainable Neural Code Completion Systems Zhensu Sun Xiaoning Du Fu Song Shangwen Wang Mingze Ni Li Li 29 10 0 13 Sep 2022
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation Shuai Lu Daya Guo Shuo Ren Junjie Huang Alexey Svyatkovskiy ... Nan Duan Neel Sundaresan Shao Kun Deng Shengyu Fu Shujie Liu ELM 204 853 0 09 Feb 2021