ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.14898
  4. Cited By
InterCode: Standardizing and Benchmarking Interactive Coding with
  Execution Feedback

InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback

26 June 2023
John Yang
Akshara Prabhakar
Karthik R. Narasimhan
Shunyu Yao
ArXivPDFHTML

Papers citing "InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback"

50 / 83 papers shown
Title
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
HiBayES: A Hierarchical Bayesian Modeling Framework for AI Evaluation Statistics
Lennart Luettgau
Harry Coppock
Magda Dubois
Christopher Summerfield
Cozmin Ududec
31
0
0
08 May 2025
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Self-Generated In-Context Examples Improve LLM Agents for Sequential Decision-Making Tasks
Vishnu Sarukkai
Zhiqiang Xie
Kayvon Fatahalian
LLMAG
75
0
0
01 May 2025
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
CodeFlowBench: A Multi-turn, Iterative Benchmark for Complex Code Generation
Sizhe Wang
Zihan Wang
Dongsheng Ma
Yongan Yu
Rui Ling
Zehan Li
Zhiyu Li
Wenbo Zhang
LRM
65
0
0
30 Apr 2025
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
When Reasoning Beats Scale: A 1.5B Reasoning Model Outranks 13B LLMs as Discriminator
Md Fahim Anjum
LRM
34
0
0
30 Apr 2025
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
APIGen-MT: Agentic Pipeline for Multi-Turn Data Generation via Simulated Agent-Human Interplay
Akshara Prabhakar
Ziqiang Liu
Weiran Yao
Jianguo Zhang
Ming Zhu
...
Juan Carlos Niebles
Shelby Heinecke
Han Wang
Shri Kiran Srinivasan
Caiming Xiong
VGen
90
2
0
04 Apr 2025
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection
Review, Refine, Repeat: Understanding Iterative Decoding of AI Agents with Dynamic Evaluation and Selection
Souradip Chakraborty
Mohammadreza Pourreza
Ruoxi Sun
Yiwen Song
Nino Scherrer
...
Furong Huang
Amrit Singh Bedi
Ahmad Beirami
Hamid Palangi
Tomas Pfister
53
0
0
02 Apr 2025
A Training-free LLM Framework with Interaction between Contextually Related Subtasks in Solving Complex Tasks
A Training-free LLM Framework with Interaction between Contextually Related Subtasks in Solving Complex Tasks
Hongjia Liu
Jinlong Li
LRM
54
0
0
29 Mar 2025
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Synthetic Data Generation Using Large Language Models: Advances in Text and Code
Mihai Nadas
Laura Diosan
Andreea Tomescu
SyDa
72
2
0
18 Mar 2025
The KoLMogorov Test: Compression by Code Generation
The KoLMogorov Test: Compression by Code Generation
Ori Yoran
Kunhao Zheng
Fabian Gloeckle
Jonas Gehring
Gabriel Synnaeve
Taco Cohen
64
1
0
18 Mar 2025
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Code-Driven Inductive Synthesis: Enhancing Reasoning Abilities of Large Language Models with Sequences
Kedi Chen
Zhikai Lei
Fan Zhang
Yinqi Zhang
Qin Chen
Jie Zhou
Liang He
Qipeng Guo
K. Chen
Wei-na Zhang
ELM
LRM
70
0
0
17 Mar 2025
A Survey on the Optimization of Large Language Model-based Agents
A Survey on the Optimization of Large Language Model-based Agents
Shangheng Du
Jiabao Zhao
Jinxin Shi
Zhentao Xie
Xin Jiang
Yanhong Bai
Liang He
LLMAG
LM&Ro
LM&MA
274
1
0
16 Mar 2025
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
A Framework for Evaluating Emerging Cyberattack Capabilities of AI
Mikel Rodriguez
Raluca Ada Popa
Four Flynn
Lihao Liang
Allan Dafoe
Anna Wang
ELM
71
5
0
14 Mar 2025
KodCode: A Diverse, Challenging, and Verifiable Synthetic Dataset for Coding
Zhangchen Xu
Yang Liu
Yueqin Yin
Mingyuan Zhou
Radha Poovendran
ALM
OffRL
84
9
0
04 Mar 2025
Improving Retrospective Language Agents via Joint Policy Gradient Optimization
Xueyang Feng
Bo Lan
Quanyu Dai
Lei Wang
Jiakai Tang
X. Chen
Zhenhua Dong
Zhicheng Dou
LLMAG
67
0
0
03 Mar 2025
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
ConvCodeWorld: Benchmarking Conversational Code Generation in Reproducible Feedback Environments
Hojae Han
Seung-won Hwang
Rajhans Samdani
Yuxiong He
ALM
73
2
0
27 Feb 2025
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Code to Think, Think to Code: A Survey on Code-Enhanced Reasoning and Reasoning-Driven Code Intelligence in LLMs
Dayu Yang
Tianyang Liu
Daoan Zhang
Antoine Simoulin
Xiaoyi Liu
...
Zhaopu Teng
Xin Qian
Grey Yang
Jiebo Luo
Julian McAuley
ReLM
OffRL
LRM
89
4
0
26 Feb 2025
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
InterFeedback: Unveiling Interactive Intelligence of Large Multimodal Models via Human Feedback
Henry Hengyuan Zhao
Wenqi Pei
Yifei Tao
Haiyang Mei
Mike Zheng Shou
51
0
0
20 Feb 2025
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration
ReFoRCE: A Text-to-SQL Agent with Self-Refinement, Format Restriction, and Column Exploration
Minghang Deng
Ashwin Ramachandran
Canwen Xu
Lanxiang Hu
Zhewei Yao
Anupam Datta
Hao Zhang
LMTD
139
1
0
02 Feb 2025
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel
  Planning
Towards Full Delegation: Designing Ideal Agentic Behaviors for Travel Planning
Song Jiang
Da JU
Andrew Cohen
Sasha Mitts
Aaron Foss
Justine T Kao
Xian Li
Yuandong Tian
67
3
0
21 Nov 2024
Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms
Minghe Gao
Wendong Bu
Bingchen Miao
Yang Wu
Yunfei Li
Juncheng Billy Li
Siliang Tang
Qi Wu
Yueting Zhuang
Meng Wang
LM&Ro
53
3
0
17 Nov 2024
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Spider 2.0: Evaluating Language Models on Real-World Enterprise Text-to-SQL Workflows
Fangyu Lei
Jixuan Chen
Yuxiao Ye
Ruisheng Cao
Dongchan Shin
...
Caiming Xiong
Ruoxi Sun
Qian Liu
Sida I. Wang
Tao Yu
LMTD
82
21
0
12 Nov 2024
OSCAR: Operating System Control via State-Aware Reasoning and
  Re-Planning
OSCAR: Operating System Control via State-Aware Reasoning and Re-Planning
Xiaoqiang Wang
Bang Liu
LLMAG
LM&Ro
LRM
54
6
0
24 Oct 2024
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
SecCodePLT: A Unified Platform for Evaluating the Security of Code GenAI
Yu Yang
Yuzhou Nie
Zhun Wang
Yuheng Tang
Wenbo Guo
Bo Li
D. Song
ELM
38
6
0
14 Oct 2024
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
FB-Bench: A Fine-Grained Multi-Task Benchmark for Evaluating LLMs' Responsiveness to Human Feedback
Heng Chang
Miao Zheng
Fan Yang
Guosheng Dong
Bin Cui
Xin Wu
Zenan Zhou
Wentao Zhang
ALM
51
6
0
12 Oct 2024
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM
  Agent Cyber Offense Capabilities
Catastrophic Cyber Capabilities Benchmark (3CB): Robustly Evaluating LLM Agent Cyber Offense Capabilities
Andrey Anurin
Jonathan Ng
Kibo Schaffer
Jason Schreiber
Esben Kran
ELM
40
5
0
10 Oct 2024
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+
  Interaction Trajectories
AgentBank: Towards Generalized LLM Agents via Fine-Tuning on 50000+ Interaction Trajectories
Yifan Song
Weimin Xiong
Xiutian Zhao
Dawei Zhu
Wenhao Wu
Ke Wang
Cheng Li
Wei Peng
Sujian Li
LLMAG
36
10
0
10 Oct 2024
DA-Code: Agent Data Science Code Generation Benchmark for Large Language
  Models
DA-Code: Agent Data Science Code Generation Benchmark for Large Language Models
Yiming Huang
Jianwen Luo
Yan Yu
Yitong Zhang
Fangyu Lei
...
Shizhu He
Lifu Huang
Xiao Liu
Jun Zhao
Kang Liu
ELM
ALM
AI4CE
23
6
0
09 Oct 2024
Better than Your Teacher: LLM Agents that learn from Privileged AI
  Feedback
Better than Your Teacher: LLM Agents that learn from Privileged AI Feedback
Sanjiban Choudhury
Paloma Sodhi
LLMAG
40
4
0
07 Oct 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
  Domains?
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik R. Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
41
24
0
04 Oct 2024
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following
  Benchmark
MMMT-IF: A Challenging Multimodal Multi-Turn Instruction Following Benchmark
Elliot L. Epstein
Kaisheng Yao
Jing Li
Xinyi Bai
Hamid Palangi
LRM
47
0
0
26 Sep 2024
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via
  Automated Bash Script Generation, Assessment, and Refinement
ScriptSmith: A Unified LLM Framework for Enhancing IT Operations via Automated Bash Script Generation, Assessment, and Refinement
Oishik Chatterjee
Pooja Aggarwal
Suranjana Samanta
Ting Dai
P. Mohapatra
...
Ruchi Mahindru
Steve Barbieri
Eugen Postea
Brad Blancett
Arthur De Magalhaes
28
2
0
12 Sep 2024
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks
  at Scale
HyperAgent: Generalist Software Engineering Agents to Solve Coding Tasks at Scale
Huy N. Phan
Phong X. Nguyen
Nghi D. Q. Bui
LLMAG
33
12
0
09 Sep 2024
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question
  Answering
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
Ruosen Li
Barry Wang
Ruochen Li
Xinya Du
ELM
33
5
0
24 Aug 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLM
MoE
OSLM
37
673
0
31 Jul 2024
Spider2-V: How Far Are Multimodal Agents From Automating Data Science
  and Engineering Workflows?
Spider2-V: How Far Are Multimodal Agents From Automating Data Science and Engineering Workflows?
Ruisheng Cao
Fangyu Lei
Haoyuan Wu
Jixuan Chen
Yeqiao Fu
...
Qian Liu
Victor Zhong
Lu Chen
Kai Yu
Tao Yu
48
18
0
15 Jul 2024
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model
  Agents
CRAB: Cross-environment Agent Benchmark for Multimodal Language Model Agents
Tianqi Xu
Linyao Chen
Dai-Jie Wu
Yanjun Chen
Zecheng Zhang
...
Shilong Liu
Bochen Qian
Philip Torr
Guohao Li
Ge Li
57
14
0
01 Jul 2024
Hierarchical Context Pruning: Optimizing Real-World Code Completion with
  Repository-Level Pretrained Code LLMs
Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs
Lei Zhang
Yunshui Li
Jiaming Li
Xiaobo Xia
Jiaxi Yang
Run Luo
Minzheng Wang
Longze Chen
Junhao Liu
Min Yang
40
1
0
26 Jun 2024
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating
  Automated Scientific Discovery Agents
DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents
Peter Alexander Jansen
Marc-Alexandre Côté
Tushar Khot
Erin Bransom
Bhavana Dalvi Mishra
Bodhisattwa Prasad Majumder
Oyvind Tafjord
Peter Clark
LLMAG
43
22
0
10 Jun 2024
Adaptive In-conversation Team Building for Language Model Agents
Adaptive In-conversation Team Building for Language Model Agents
Linxin Song
Jiale Liu
Jieyu Zhang
Shaokun Zhang
Ao Luo
Shijian Wang
Qingyun Wu
Chi Wang
LLMAG
71
10
0
29 May 2024
IntelliExplain: Enhancing Interactive Code Generation through Natural
  Language Explanations for Non-Professional Programmers
IntelliExplain: Enhancing Interactive Code Generation through Natural Language Explanations for Non-Professional Programmers
Hao Yan
Thomas D. Latoza
Ziyu Yao
LRM
45
0
0
16 May 2024
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large
  Language Models in Code Generation from Scientific Plots
Plot2Code: A Comprehensive Benchmark for Evaluating Multi-modal Large Language Models in Code Generation from Scientific Plots
Chengyue Wu
Yixiao Ge
Qiushan Guo
Jiahao Wang
Zhixuan Liang
Zeyu Lu
Ying Shan
Ping Luo
MLLM
VLM
36
0
0
13 May 2024
Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit
  Distance
Revisiting Code Similarity Evaluation with Abstract Syntax Tree Edit Distance
Yewei Song
Cedric Lothritz
Daniel Tang
Tegawende F. Bissyande
Jacques Klein
54
9
0
12 Apr 2024
Best Practices and Lessons Learned on Synthetic Data for Language Models
Best Practices and Lessons Learned on Synthetic Data for Language Models
Ruibo Liu
Jerry W. Wei
Fangyu Liu
Chenglei Si
Yanzhe Zhang
...
Steven Zheng
Daiyi Peng
Diyi Yang
Denny Zhou
Andrew M. Dai
SyDa
EgoV
43
86
0
11 Apr 2024
RoT: Enhancing Large Language Models with Reflection on Search Trees
RoT: Enhancing Large Language Models with Reflection on Search Trees
Wenyang Hui
Kewei Tu
LRM
32
6
0
08 Apr 2024
The RealHumanEval: Evaluating Large Language Models' Abilities to
  Support Programmers
The RealHumanEval: Evaluating Large Language Models' Abilities to Support Programmers
Hussein Mozannar
Valerie Chen
Mohammed Alsobay
Subhro Das
Sebastian Zhao
Dennis L. Wei
Manish Nagireddy
P. Sattigeri
Ameet Talwalkar
David Sontag
ELM
46
18
0
03 Apr 2024
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
StateFlow: Enhancing LLM Task-Solving through State-Driven Workflows
Yiran Wu
Tianwei Yue
Shaokun Zhang
Chi Wang
Qingyun Wu
48
21
0
17 Mar 2024
DevBench: A Comprehensive Benchmark for Software Development
DevBench: A Comprehensive Benchmark for Software Development
Bowen Li
Wenhan Wu
Ziwei Tang
Lin Shi
John Yang
...
He Du
Ping Yang
Dahua Lin
Chao Peng
Kai Chen
99
10
0
13 Mar 2024
DACO: Towards Application-Driven and Comprehensive Data Analysis via
  Code Generation
DACO: Towards Application-Driven and Comprehensive Data Analysis via Code Generation
Xueqing Wu
Rui Zheng
Jingzhen Sha
Te-Lin Wu
Hanyu Zhou
Mohan Tang
Kai-Wei Chang
Nanyun Peng
Haoran Huang
55
2
0
04 Mar 2024
Identify Critical Nodes in Complex Network with Large Language Models
Identify Critical Nodes in Complex Network with Large Language Models
Jinzhu Mao
Dongyun Zou
Li Sheng
Siyi Liu
Chen Gao
Yue Wang
Yong Li
45
3
0
01 Mar 2024
How Can LLM Guide RL? A Value-Based Approach
How Can LLM Guide RL? A Value-Based Approach
Shenao Zhang
Sirui Zheng
Shuqi Ke
Zhihan Liu
Wanxin Jin
Jianbo Yuan
Yingxiang Yang
Hongxia Yang
Zhaoran Wang
35
8
0
25 Feb 2024
12
Next