Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2401.15963
Cited By
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
29 January 2024
Manav Singhal
Tushar Aggarwal
Abhijeet Awasthi
Nagarajan Natarajan
Aditya Kanade
Re-assign community
ArXiv
PDF
HTML
Papers citing
"NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness"
15 / 15 papers shown
Title
RobuNFR: Evaluating the Robustness of Large Language Models on Non-Functional Requirements Aware Code Generation
Feng Lin
Dong Jae Kim
Z. Li
Jinqiu Yang
Tse-Husn
Chen
AAML
38
0
0
28 Mar 2025
Robust Learning of Diverse Code Edits
Tushar Aggarwal
Swayam Singh
Abhijeet Awasthi
Aditya Kanade
Nagarajan Natarajan
SyDa
157
0
0
05 Mar 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
Shing-Chi Cheung
ALM
71
1
0
18 Jan 2025
Automatic Programming: Large Language Models and Beyond
Michael R. Lyu
Baishakhi Ray
Abhik Roychoudhury
Shin Hwei Tan
Patanamon Thongtanunam
33
15
0
03 May 2024
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences
Martin Weyssow
Aton Kamanda
H. Sahraoui
ALM
59
30
0
14 Mar 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
36
274
0
12 Mar 2024
Guiding Language Models of Code with Global Context using Monitors
Lakshya A Agrawal
Aditya Kanade
Navin Goyal
Shuvendu K. Lahiri
S. Rajamani
38
23
0
19 Jun 2023
CodeGen2: Lessons for Training LLMs on Programming and Natural Languages
Erik Nijkamp
A. Ghobadzadeh
Caiming Xiong
Silvio Savarese
Yingbo Zhou
152
164
0
03 May 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
183
791
0
02 May 2023
DeepPERF: A Deep Learning-Based Approach For Improving Software Performance
Spandan Garg
Roshanak Zilouchian Moghaddam
Colin B. Clement
Neel Sundaresan
Chen Henry Wu
32
7
0
27 Jun 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
364
8,495
0
28 Jan 2022
CodeQA: A Question Answering Dataset for Source Code Comprehension
Chenxiao Liu
Xiaojun Wan
37
27
0
17 Sep 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
624
0
20 May 2021
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
198
1,105
0
09 Feb 2021
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
240
4,469
0
23 Jan 2020
1