ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2105.09938
  4. Cited By
Measuring Coding Challenge Competence With APPS

Measuring Coding Challenge Competence With APPS

20 May 2021
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
Ethan Guo
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
    ELM
    AIMat
    ALM
ArXivPDFHTML

Papers citing "Measuring Coding Challenge Competence With APPS"

50 / 130 papers shown
Title
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Prompting Techniques for Secure Code Generation: A Systematic Investigation
Catherine Tony
Nicolás E. Díaz Ferreyra
Markus Mutas
Salem Dhiff
Riccardo Scandariato
SILM
73
9
0
09 Jul 2024
On Speeding Up Language Model Evaluation
On Speeding Up Language Model Evaluation
Jin Peng Zhou
Christian K. Belardi
Ruihan Wu
Travis Zhang
Carla P. Gomes
Wen Sun
Kilian Q. Weinberger
58
1
0
08 Jul 2024
A Systematic Survey and Critical Review on Evaluating Large Language
  Models: Challenges, Limitations, and Recommendations
A Systematic Survey and Critical Review on Evaluating Large Language Models: Challenges, Limitations, and Recommendations
Md Tahmid Rahman Laskar
Sawsan Alqahtani
M Saiful Bari
Mizanur Rahman
Mohammad Abdullah Matin Khan
...
Chee Wei Tan
Md. Rizwan Parvez
Enamul Hoque
Shafiq R. Joty
Jimmy Huang
ELM
ALM
27
27
0
04 Jul 2024
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
CoIR: A Comprehensive Benchmark for Code Information Retrieval Models
Xiangyang Li
Kuicai Dong
Yi Quan Lee
Wei Xia
Yichun Yin
Xinyi Dai
Yasheng Wang
Ruiming Tang
59
15
0
03 Jul 2024
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
ConCodeEval: Evaluating Large Language Models for Code Constraints in Domain-Specific Languages
Mehant Kammakomati
Sameer Pimparkhede
Srikanth G. Tamilselvam
Prince Kumar
Pushpak Bhattacharyya
ALM
40
0
0
03 Jul 2024
Agentless: Demystifying LLM-based Software Engineering Agents
Agentless: Demystifying LLM-based Software Engineering Agents
Chunqiu Steven Xia
Yinlin Deng
Soren Dunn
Lingming Zhang
LLMAG
41
84
0
01 Jul 2024
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs
Applying RLAIF for Code Generation with API-usage in Lightweight LLMs
Sujan Dutta
Sayantan Mahinder
R. Anantha
Bortik Bandyopadhyay
ALM
36
4
0
28 Jun 2024
Figuring out Figures: Using Textual References to Caption Scientific
  Figures
Figuring out Figures: Using Textual References to Caption Scientific Figures
Stanley Cao
Kevin Liu
34
0
0
25 Jun 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
74
131
0
22 Jun 2024
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
OlympicArena: Benchmarking Multi-discipline Cognitive Reasoning for Superintelligent AI
Zhen Huang
Zengzhi Wang
Shijie Xia
Xuefeng Li
Haoyang Zou
...
Yuxiang Zheng
Shaoting Zhang
Dahua Lin
Yu Qiao
Pengfei Liu
ELM
LRM
49
26
0
18 Jun 2024
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
ChartMimic: Evaluating LMM's Cross-Modal Reasoning Capability via Chart-to-Code Generation
Cheng Yang
Chufan Shi
Yaxin Liu
Bo Shui
Junjie Wang
...
Yuxiang Zhang
Gongye Liu
Xiaomei Nie
Deng Cai
Yujiu Yang
MLLM
LRM
51
22
0
14 Jun 2024
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Benchmarking Generative Models on Computational Thinking Tests in Elementary Visual Programming
Victor-Alexandru Pădurean
Adish Singla
ELM
51
3
0
14 Jun 2024
DafnyBench: A Benchmark for Formal Software Verification
DafnyBench: A Benchmark for Formal Software Verification
Chloe Loughridge
Qinyi Sun
Seth Ahrenbach
Federico Cassano
Chuyue Sun
Ying Sheng
Anish Mudide
Md Rakib Hossain Misu
Nada Amin
Max Tegmark
ALM
AI4CE
46
8
0
12 Jun 2024
Is Programming by Example solved by LLMs?
Is Programming by Example solved by LLMs?
Wen-Ding Li
Kevin Ellis
37
9
0
12 Jun 2024
Learning Task Decomposition to Assist Humans in Competitive Programming
Learning Task Decomposition to Assist Humans in Competitive Programming
Jiaxin Wen
Ruiqi Zhong
Pei Ke
Zhihong Shao
Hongning Wang
Minlie Huang
ReLM
34
8
0
07 Jun 2024
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Robo-Instruct: Simulator-Augmented Instruction Alignment For Finetuning Code LLMs
Zichao Hu
Junyi Jessy Li
Arjun Guha
Joydeep Biswas
SyDa
ALM
51
1
0
30 May 2024
Stress-Testing Capability Elicitation With Password-Locked Models
Stress-Testing Capability Elicitation With Password-Locked Models
Ryan Greenblatt
Fabien Roger
Dmitrii Krasheninnikov
David M. Krueger
32
14
0
29 May 2024
Kotlin ML Pack: Technical Report
Kotlin ML Pack: Technical Report
Sergey Titov
Mikhail Evtikhiev
Anton Shapkin
Oleg Smirnov
Sergei Boytsov
...
Dariia Karaeva
Maksim Sheptyakov
Mikhail Arkhipov
T. Bryksin
Egor Bogomolov
32
0
0
29 May 2024
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Code Repair with LLMs gives an Exploration-Exploitation Tradeoff
Hao Tang
Keya Hu
Jin Peng Zhou
Sicheng Zhong
Wei-Long Zheng
Xujie Si
Kevin Ellis
34
13
0
26 May 2024
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
Marc Oedingen
Raphael C. Engelhardt
Robin Denz
Maximilian Hammer
Wolfgang Konen
DeLMO
37
8
0
24 May 2024
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Preble: Efficient Distributed Prompt Scheduling for LLM Serving
Vikranth Srivatsa
Zijian He
Reyna Abhyankar
Dongming Li
Yiying Zhang
52
17
0
08 May 2024
Better & Faster Large Language Models via Multi-token Prediction
Better & Faster Large Language Models via Multi-token Prediction
Fabian Gloeckle
Badr Youbi Idrissi
Baptiste Rozière
David Lopez-Paz
Gabriele Synnaeve
24
93
0
30 Apr 2024
PECC: Problem Extraction and Coding Challenges
PECC: Problem Extraction and Coding Challenges
Patrick Haller
Jonas Golde
Alan Akbik
ReLM
32
5
0
29 Apr 2024
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Exploring and Evaluating Hallucinations in LLM-Powered Code Generation
Fang Liu
Yang Liu
Lin Shi
Houkun Huang
Ruifeng Wang
Zhen Yang
Li Zhang
Zhongqi Li
Yuchi Ma
52
108
0
01 Apr 2024
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
DeFT: Decoding with Flash Tree-attention for Efficient Tree-structured LLM Inference
Jinwei Yao
Kaiqi Chen
Kexun Zhang
Jiaxuan You
Binhang Yuan
Zeke Wang
Tao Lin
35
2
0
30 Mar 2024
Semi-Instruct: Bridging Natural-Instruct and Self-Instruct for Code
  Large Language Models
Semi-Instruct: Bridging Natural-Instruct and Self-Instruct for Code Large Language Models
Xianzhen Luo
Qingfu Zhu
Zhiming Zhang
Xu Wang
Qing Yang
Dongliang Xu
Wanxiang Che
ALM
32
2
0
01 Mar 2024
Large Language Models: A Survey
Large Language Models: A Survey
Shervin Minaee
Tomáš Mikolov
Narjes Nikzad
M. Asgari-Chenaghlu
R. Socher
Xavier Amatriain
Jianfeng Gao
ALM
LM&MA
ELM
122
369
0
09 Feb 2024
AI Control: Improving Safety Despite Intentional Subversion
AI Control: Improving Safety Despite Intentional Subversion
Ryan Greenblatt
Buck Shlegeris
Kshitij Sachan
Fabien Roger
29
38
0
12 Dec 2023
CodeChain: Towards Modular Code Generation Through Chain of
  Self-revisions with Representative Sub-modules
CodeChain: Towards Modular Code Generation Through Chain of Self-revisions with Representative Sub-modules
Hung Le
Hailin Chen
Amrita Saha
Akash Gokul
Doyen Sahoo
Shafiq R. Joty
LRM
28
42
0
13 Oct 2023
Foundation Metrics for Evaluating Effectiveness of Healthcare
  Conversations Powered by Generative AI
Foundation Metrics for Evaluating Effectiveness of Healthcare Conversations Powered by Generative AI
Mahyar Abbasian
Elahe Khatibi
Iman Azimi
David Oniani
Zahra Shakeri Hossein Abad
...
Bryant Lin
Olivier Gevaert
Li-Jia Li
Ramesh C. Jain
Amir M. Rahmani
LM&MA
ELM
AI4MH
37
66
0
21 Sep 2023
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Beyond Static Datasets: A Deep Interaction Approach to LLM Evaluation
Jiatong Li
Rui Li
Qi Liu
26
14
0
08 Sep 2023
Benchmarks for Detecting Measurement Tampering
Benchmarks for Detecting Measurement Tampering
Fabien Roger
Ryan Greenblatt
Max Nadeau
Buck Shlegeris
Nate Thomas
28
2
0
29 Aug 2023
Natural Language Generation and Understanding of Big Code for
  AI-Assisted Programming: A Review
Natural Language Generation and Understanding of Big Code for AI-Assisted Programming: A Review
M. Wong
Shangxin Guo
Ching Nam Hang
Siu-Wai Ho
C. Tan
42
78
0
04 Jul 2023
Personality Traits in Large Language Models
Personality Traits in Large Language Models
Gregory Serapio-García
Mustafa Safdari
Clément Crepy
Luning Sun
Stephen Fitz
P. Romero
Marwa Abdulhai
Aleksandra Faust
Maja J. Matarić
LM&MA
LLMAG
58
119
0
01 Jul 2023
Is Self-Repair a Silver Bullet for Code Generation?
Is Self-Repair a Silver Bullet for Code Generation?
Theo X. Olausson
J. Inala
Chenglong Wang
Jianfeng Gao
Armando Solar-Lezama
LRM
26
108
0
16 Jun 2023
SelfEvolve: A Code Evolution Framework via Large Language Models
SelfEvolve: A Code Evolution Framework via Large Language Models
Shuyang Jiang
Yuhao Wang
Yu Wang
16
32
0
05 Jun 2023
A New Era in Software Security: Towards Self-Healing Software via Large
  Language Models and Formal Verification
A New Era in Software Security: Towards Self-Healing Software via Large Language Models and Formal Verification
Norbert Tihanyi
Ridhi Jain
Yiannis Charalambous
M. Ferrag
Youcheng Sun
Lucas C. Cordeiro
21
48
0
24 May 2023
ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle
  Verifiers
ALGO: Synthesizing Algorithmic Programs with LLM-Generated Oracle Verifiers
Kexun Zhang
Danqing Wang
Jingtao Xia
William Yang Wang
Lei Li
28
40
0
24 May 2023
Neural Machine Translation for Code Generation
Neural Machine Translation for Code Generation
K. Dharma
Clayton T. Morrison
32
4
0
22 May 2023
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning
  and Coding with LLMs
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning and Coding with LLMs
Pranjal Aggarwal
Aman Madaan
Yiming Yang
Mausam
LRM
28
36
0
19 May 2023
Think Outside the Code: Brainstorming Boosts Large Language Models in
  Code Generation
Think Outside the Code: Brainstorming Boosts Large Language Models in Code Generation
Xinyu Li
Jiang-Tian Xue
Zheng Xie
Ming Li
LRM
19
26
0
18 May 2023
The Vault: A Comprehensive Multilingual Dataset for Advancing Code
  Understanding and Generation
The Vault: A Comprehensive Multilingual Dataset for Advancing Code Understanding and Generation
Dũng Nguyễn Mạnh
Nam Le Hai
An Dau
A. Nguyen
Khanh N. Nghiem
Jingnan Guo
Nghi D. Q. Bui
26
15
0
09 May 2023
Stochastic Code Generation
Stochastic Code Generation
Swapnil Sharma
Nikita Anand
V. KranthiKiranG.
SyDa
22
0
0
14 Apr 2023
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual
  Benchmarking on HumanEval-X
CodeGeeX: A Pre-Trained Model for Code Generation with Multilingual Benchmarking on HumanEval-X
Qinkai Zheng
Xiao Xia
Xu Zou
Yuxiao Dong
Shanshan Wang
...
Andi Wang
Yang Li
Teng Su
Zhilin Yang
Jie Tang
ELM
ALM
SyDa
52
316
0
30 Mar 2023
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval
  and Generation
RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation
Fengji Zhang
B. Chen
Yue Zhang
Jacky Keung
Jin Liu
Daoguang Zan
Yi Mao
Jian-Guang Lou
Weizhu Chen
25
219
0
22 Mar 2023
Bounding the Capabilities of Large Language Models in Open Text
  Generation with Prompt Constraints
Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints
Albert Lu
Hongxin Zhang
Yanzhe Zhang
Xuezhi Wang
Diyi Yang
LRM
32
28
0
17 Feb 2023
PAC Prediction Sets for Large Language Models of Code
PAC Prediction Sets for Large Language Models of Code
Adam Khakhar
Stephen Mell
Osbert Bastani
20
6
0
17 Feb 2023
Execution-based Code Generation using Deep Reinforcement Learning
Execution-based Code Generation using Deep Reinforcement Learning
Parshin Shojaee
Aneesh Jain
Sindhu Tipirneni
Chandan K. Reddy
23
51
0
31 Jan 2023
Natural Language to Code Generation in Interactive Data Science
  Notebooks
Natural Language to Code Generation in Interactive Data Science Notebooks
Pengcheng Yin
Wen-Ding Li
Kefan Xiao
Abhishek Rao
Yeming Wen
...
Paige Bailey
Michele Catasta
Henryk Michalewski
Oleksandr Polozov
Charles Sutton
31
56
0
19 Dec 2022
Plansformer: Generating Symbolic Plans using Transformers
Plansformer: Generating Symbolic Plans using Transformers
Vishal Pallagani
Bharath Muppasani
K. Murugesan
F. Rossi
L. Horesh
Biplav Srivastava
F. Fabiano
Andrea Loreggia
LM&Ro
LLMAG
OffRL
15
35
0
16 Dec 2022
Previous
123
Next