Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.08388
Cited By
MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages
16 March 2022
Zhiruo Wang
Grace Cuenca
Shuyan Zhou
Frank F. Xu
Graham Neubig
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MCoNaLa: A Benchmark for Code Generation from Multiple Natural Languages"
33 / 33 papers shown
Title
LLM Benchmarking with LLaMA2: Evaluating Code Development Performance Across Multiple Programming Languages
Patrick Diehl
Nojoud Nader
Maxim Moraru
Steven R. Brandt
39
1
0
24 Mar 2025
Benchmarking AI Models in Software Engineering: A Review, Search Tool, and Enhancement Protocol
Roham Koohestani
Philippe de Bekker
M. Izadi
VLM
45
0
0
07 Mar 2025
ThrowBench: Benchmarking LLMs by Predicting Runtime Exceptions
Julian Aron Prenner
Romain Robbes
61
0
0
06 Mar 2025
How Diversely Can Language Models Solve Problems? Exploring the Algorithmic Diversity of Model-Generated Code
Seonghyeon Lee
Heejae Chon
Joonwon Jang
Dongha Lee
Hwanjo Yu
ALM
39
0
0
02 Mar 2025
LessLeak-Bench: A First Investigation of Data Leakage in LLMs Across 83 Software Engineering Benchmarks
Xin Zhou
Martin Weyssow
Ratnadira Widyasari
Ting Zhang
Junda He
Yunbo Lyu
Jianming Chang
Beiqi Zhang
Dan Huang
David Lo
PILM
297
1
0
10 Feb 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
Shing-Chi Cheung
ALM
71
1
0
18 Jan 2025
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
41
23
0
04 Oct 2024
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Huy N. Phan
Phong X. Nguyen
Nghi D. Q. Bui
LLMAG
33
11
0
09 Sep 2024
Is Functional Correctness Enough to Evaluate Code Language Models? Exploring Diversity of Generated Codes
Heejae Chon
Seonghyeon Lee
Jinyoung Yeo
Dongha Lee
ALM
41
1
0
24 Aug 2024
Training Task Experts through Retrieval Based Distillation
Jiaxin Ge
Xueying Jia
Vijay Viswanathan
Hongyin Luo
Graham Neubig
40
3
0
07 Jul 2024
AMBROSIA: A Benchmark for Parsing Ambiguous Questions into Database Queries
Irina Saparina
Mirella Lapata
57
11
0
27 Jun 2024
Automatic Programming: Large Language Models and Beyond
Michael R. Lyu
Baishakhi Ray
Abhik Roychoudhury
Shin Hwei Tan
Patanamon Thongtanunam
33
15
0
03 May 2024
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
46
2
0
26 Apr 2024
Analyzing the Performance of Large Language Models on Code Summarization
Rajarshi Haldar
J. Hockenmaier
43
18
0
10 Apr 2024
Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
Libo Qin
Qiguang Chen
Yuhang Zhou
Zhi Chen
Hai-Tao Zheng
Lizi Liao
Min Li
Wanxiang Che
Philip S. Yu
LRM
55
36
0
07 Apr 2024
CloudEval-YAML: A Practical Benchmark for Cloud Configuration Generation
Yifei Xu
Yuning Chen
Xumiao Zhang
Xianshang Lin
Pan Hu†
...
Songwu Lu
Wan Du
Z. Mao
Ennan Zhai
Dennis Cai
ALM
40
9
0
10 Nov 2023
BioCoder: A Benchmark for Bioinformatics Code Generation with Large Language Models
Xiangru Tang
Bill Qian
Rick Gao
Jiakang Chen
Xinyun Chen
Mark B. Gerstein
23
11
0
31 Aug 2023
Prompt2Model: Generating Deployable Models from Natural Language Instructions
Vijay Viswanathan
Chenyang Zhao
Amanda Bertsch
Tongshuang Wu
Graham Neubig
33
36
0
23 Aug 2023
OctoPack: Instruction Tuning Code Large Language Models
Niklas Muennighoff
Qian Liu
A. Zebaze
Qinkai Zheng
Binyuan Hui
Terry Yue Zhuo
Swayam Singh
Xiangru Tang
Leandro von Werra
Shayne Longpre
VLM
ALM
71
117
0
14 Aug 2023
Optimal Transport Posterior Alignment for Cross-lingual Semantic Parsing
Tom Sherborne
Tom Hosking
Mirella Lapata
OT
24
4
0
09 Jul 2023
XSemPLR: Cross-Lingual Semantic Parsing in Multiple Natural Languages and Meaning Representations
Yusen Zhang
Jun Wang
Zhiguo Wang
Rui Zhang
VLM
73
9
0
07 Jun 2023
"What It Wants Me To Say": Bridging the Abstraction Gap Between End-User Programmers and Code-Generating Large Language Models
Michael Xieyang Liu
Advait Sarkar
Carina Negreanu
B. Zorn
Jack Williams
N. Toronto
Andrew D. Gordon
29
106
0
13 Apr 2023
xCodeEval: A Large Scale Multilingual Multitask Benchmark for Code Understanding, Generation, Translation and Retrieval
Mohammad Abdullah Matin Khan
M Saiful Bari
Xuan Long Do
Weishi Wang
Md. Rizwan Parvez
Chenyu You
ALM
ELM
34
14
0
06 Mar 2023
Measuring The Impact Of Programming Language Distribution
Gabriel Orlanski
Kefan Xiao
Xavier Garcia
Jeffrey Hui
Joshua Howland
J. Malmaud
Jacob Austin
Rishah Singh
Michele Catasta
30
28
0
03 Feb 2023
Execution-Based Evaluation for Open-Domain Code Generation
Zhiruo Wang
Shuyan Zhou
Daniel Fried
Graham Neubig
ELM
37
80
0
20 Dec 2022
Large Language Models Meet NL2Code: A Survey
Daoguang Zan
B. Chen
Fengji Zhang
Di Lu
Bingchao Wu
Bei Guan
Yongji Wang
Jian-Guang Lou
ELM
ALM
31
170
0
19 Dec 2022
ERNIE-Code: Beyond English-Centric Cross-lingual Pretraining for Programming Languages
Yekun Chai
Shuohuan Wang
Chao Pang
Yu Sun
Hao Tian
Hua Wu
30
35
0
13 Dec 2022
Multi-lingual Evaluation of Code Generation Models
Ben Athiwaratkun
Sanjay Krishna Gouda
Zijian Wang
Xiaopeng Li
Yuchen Tian
...
Baishakhi Ray
Parminder Bhatia
Sudipta Sengupta
Dan Roth
Bing Xiang
ELM
120
161
0
26 Oct 2022
Bootstrapping Multilingual Semantic Parsers using Large Language Models
Abhijeet Awasthi
Nitish Gupta
Bidisha Samanta
Shachi Dave
Sunita Sarawagi
Partha P. Talukdar
40
7
0
13 Oct 2022
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation
Federico Cassano
John Gouwar
Daniel Nguyen
S. Nguyen
Luna Phipps-Costin
...
Carolyn Jane Anderson
Molly Q. Feldman
Arjun Guha
Michael Greenberg
Abhinav Jangda
ELM
24
81
0
17 Aug 2022
CodeT5: Identifier-aware Unified Pre-trained Encoder-Decoder Models for Code Understanding and Generation
Yue Wang
Weishi Wang
Chenyu You
Guosheng Lin
243
1,492
0
02 Sep 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
627
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
201
1,109
0
09 Feb 2021
1