ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2404.02806
  4. Cited By
The RealHumanEval: Evaluating Large Language Models' Abilities to
  Support Programmers

The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers

3 April 2024
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
    ELM
ArXivPDFHTML

Papers citing "The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers"

15 / 15 papers shown
Title
How Accurately Do Large Language Models Understand Code?
How Accurately Do Large Language Models Understand Code?
Sabaat Haroon
Ahmad Faraz Khan
Ahmad Humayun
Waris Gill
Abdul Haddi Amjad
A. R. Butt
Mohammad Taha Khan
Muhammad Ali Gulzar
ELM
LRM
30
1
0
06 Apr 2025
SPHERE: An Evaluation Card for Human-AI Systems
SPHERE: An Evaluation Card for Human-AI Systems
Qianou Ma
Dora Zhao
Xinran Zhao
Chenglei Si
Chenyang Yang
Ryan Louie
Ehud Reiter
Diyi Yang
Tongshuang Wu
ALM
50
0
0
24 Mar 2025
Human-AI Experience in Integrated Development Environments: A Systematic Literature Review
Agnia Sergeyuk
Ilya Zakharov
Ekaterina Koshchenko
M. Izadi
58
0
0
08 Mar 2025
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
How Should We Build A Benchmark? Revisiting 274 Code-Related Benchmarks For LLMs
Jialun Cao
Yuk-Kit Chan
Zixuan Ling
Wenxuan Wang
Shuqing Li
...
Pinjia He
Shuai Wang
Zibin Zheng
Michael R. Lyu
S. Cheung
ALM
69
1
0
18 Jan 2025
Need Help? Designing Proactive AI Assistants for Programming
Need Help? Designing Proactive AI Assistants for Programming
Valerie Chen
Alan Zhu
Sebastian Zhao
Hussein Mozannar
David Sontag
Ameet Talwalkar
37
5
0
06 Oct 2024
Not the Silver Bullet: LLM-enhanced Programming Error Messages are
  Ineffective in Practice
Not the Silver Bullet: LLM-enhanced Programming Error Messages are Ineffective in Practice
E. Santos
Brett A. Becker
24
2
0
27 Sep 2024
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on
  LLM-based Programming Assistants
MaPPing Your Model: Assessing the Impact of Adversarial Attacks on LLM-based Programming Assistants
John Heibel
Daniel Lowd
AAML
32
3
0
12 Jul 2024
A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student
  Feedback to Make Mnemonic Learning Stick
A SMART Mnemonic Sounds like "Glue Tonic": Mixing LLMs with Student Feedback to Make Mnemonic Learning Stick
Nishant Balepur
Matthew Shu
Alexander Hoyle
Alison Robey
Shi Feng
Seraphina Goldfarb-Tarrant
Jordan Boyd-Graber
44
2
0
21 Jun 2024
On the Limitations of Embedding Based Methods for Measuring Functional
  Correctness for Code Generation
On the Limitations of Embedding Based Methods for Measuring Functional Correctness for Code Generation
Atharva Naik
40
2
0
26 Apr 2024
How Do Analysts Understand and Verify AI-Assisted Data Analyses?
How Do Analysts Understand and Verify AI-Assisted Data Analyses?
Ken Gu
Ruoxi Shang
Tim Althoff
Chenglong Wang
Steven Drucker
AAML
40
25
0
19 Sep 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
183
791
0
02 May 2023
Aligning Offline Metrics and Human Judgments of Value for Code
  Generation Models
Aligning Offline Metrics and Human Judgments of Value for Code Generation Models
Victor C. Dibia
Adam Fourney
Gagan Bansal
Forough Poursabzi-Sangdeh
Han Liu
Saleema Amershi
ALM
OffRL
44
12
0
29 Oct 2022
Grounded Copilot: How Programmers Interact with Code-Generating Models
Grounded Copilot: How Programmers Interact with Code-Generating Models
Shraddha Barke
M. James
Nadia Polikarpova
164
212
0
30 Jun 2022
Productivity Assessment of Neural Code Completion
Productivity Assessment of Neural Code Completion
Albert Ziegler
Eirini Kalliamvakou
Shawn Simister
Ganesh Sittampalam
Alice Li
Andrew Rice
Devon Rifkin
E. Aftandilian
102
177
0
13 May 2022
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding
  and Generation
CodeXGLUE: A Machine Learning Benchmark Dataset for Code Understanding and Generation
Shuai Lu
Daya Guo
Shuo Ren
Junjie Huang
Alexey Svyatkovskiy
...
Nan Duan
Neel Sundaresan
Shao Kun Deng
Shengyu Fu
Shujie Liu
ELM
198
853
0
09 Feb 2021
1