ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.23671
  4. Cited By

GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents

29 May 2025
Manish Shetty
Naman Jain
Jinjian Liu
Vijay Kethanaboyina
Koushik Sen
Ion Stoica
    ELM
ArXivPDFHTML

Papers citing "GSO: Challenging Software Optimization Tasks for Evaluating SWE-Agents"

30 / 30 papers shown
Title
Challenges and Paths Towards AI for Software Engineering
Challenges and Paths Towards AI for Software Engineering
Alex Gu
Naman Jain
Wen-Ding Li
Manish Shetty
Yijia Shao
Ziyang Li
Diyi Yang
Kevin Ellis
Koushik Sen
Armando Solar-Lezama
AI4CE
40
2
0
28 Mar 2025
LocAgent: Graph-Guided LLM Agents for Code Localization
LocAgent: Graph-Guided LLM Agents for Code Localization
Zhaoling Chen
Xiangru Tang
Gangda Deng
Fang Wu
Jialong Wu
Zhiwei Jiang
Viktor Prasanna
Arman Cohan
Xingyao Wang
LLMAG
121
5
0
12 Mar 2025
RepoST: Scalable Repository-Level Coding Environment Construction with Sandbox Testing
Yiqing Xie
Alex Xie
Divyanshu Sheth
Pengfei Liu
Daniel Fried
Carolyn Rose
LRM
81
2
0
10 Mar 2025
Commit0: Library Generation from Scratch
Commit0: Library Generation from Scratch
Wenting Zhao
Nan Jiang
Celine Lee
Justin T Chiu
Claire Cardie
Matthias Gallé
Alexander M. Rush
69
6
0
02 Dec 2024
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software
  Domains?
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
55
29
0
04 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF Framework
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
65
144
0
28 Sep 2024
Evaluating Language Models for Efficient Code Generation
Evaluating Language Models for Efficient Code Generation
Jiawei Liu
Songrun Xie
Junhao Wang
Yuxiang Wei
Yifeng Ding
Lingming Zhang
24
29
0
12 Aug 2024
A Performance Study of LLM-Generated Code on Leetcode
A Performance Study of LLM-Generated Code on Leetcode
Tristan Coignion
Clément Quinton
Romain Rouvoy
79
31
0
31 Jul 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
82
158
0
22 Jun 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large
  Language Models for Code
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
65
346
0
12 Mar 2024
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
EffiBench: Benchmarking the Efficiency of Automatically Generated Code
Dong Huang
Yuhao Qing
Weiyi Shang
Heming Cui
Jie M. Zhang
94
35
0
03 Feb 2024
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional
  Correctness
NoFunEval: Funny How Code LMs Falter on Requirements Beyond Functional Correctness
Manav Singhal
Tushar Aggarwal
Abhijeet Awasthi
Nagarajan Natarajan
Aditya Kanade
47
13
0
29 Jan 2024
CodeScope: An Execution-based Multilingual Multitask Multidimensional
  Benchmark for Evaluating LLMs on Code Understanding and Generation
CodeScope: An Execution-based Multilingual Multitask Multidimensional Benchmark for Evaluating LLMs on Code Understanding and Generation
Weixiang Yan
Haitian Liu
Yunkun Wang
Yunzhe Li
Qian Chen
...
Tingyu Lin
Weishan Zhao
Li Zhu
Hari Sundaram
Shuiguang Deng
ELM
LRM
59
37
0
14 Nov 2023
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
43
529
0
10 Oct 2023
Efficient Memory Management for Large Language Model Serving with
  PagedAttention
Efficient Memory Management for Large Language Model Serving with PagedAttention
Woosuk Kwon
Zhuohan Li
Siyuan Zhuang
Ying Sheng
Lianmin Zheng
Cody Hao Yu
Joseph E. Gonzalez
Haotong Zhang
Ion Stoica
VLM
98
2,049
0
12 Sep 2023
Explaining Competitive-Level Programming Solutions using LLMs
Explaining Competitive-Level Programming Solutions using LLMs
Jierui Li
Szymon Tworkowski
Yingying Wu
Raymond J. Mooney
LRM
36
17
0
11 Jul 2023
Is Self-Repair a Silver Bullet for Code Generation?
Is Self-Repair a Silver Bullet for Code Generation?
Theo X. Olausson
J. Inala
Chenglong Wang
Jianfeng Gao
Armando Solar-Lezama
LRM
53
113
0
16 Jun 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
86
1,044
0
31 May 2023
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of
  Large Language Models for Code Generation
Is Your Code Generated by ChatGPT Really Correct? Rigorous Evaluation of Large Language Models for Code Generation
Jiawei Liu
Chun Xia
Yuyao Wang
Lingming Zhang
ELM
ALM
199
859
0
02 May 2023
Learning Performance-Improving Code Edits
Learning Performance-Improving Code Edits
Alex Shypula
Aman Madaan
Yiming Yang
Uri Alon
Jacob R. Gardner
Milad Hashemi
Graham Neubig
Parthasarathy Ranganathan
Osbert Bastani
Amir Yazdanbakhsh
SyDa
40
86
0
15 Feb 2023
Execution-Based Evaluation for Open-Domain Code Generation
Execution-Based Evaluation for Open-Domain Code Generation
Zhiruo Wang
Shuyan Zhou
Daniel Fried
Graham Neubig
ELM
55
83
0
20 Dec 2022
Natural Language to Code Generation in Interactive Data Science
  Notebooks
Natural Language to Code Generation in Interactive Data Science Notebooks
Pengcheng Yin
Wen-Ding Li
Kefan Xiao
Abhishek Rao
Yeming Wen
...
Paige Bailey
Michele Catasta
Henryk Michalewski
Oleksandr Polozov
Charles Sutton
38
60
0
19 Dec 2022
DS-1000: A Natural and Reliable Benchmark for Data Science Code
  Generation
DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation
Yuhang Lai
Chengxi Li
Yiming Wang
Tianyi Zhang
Ruiqi Zhong
Luke Zettlemoyer
Scott Yih
Daniel Fried
Si-yi Wang
Tao Yu
ELM
ALM
67
321
0
18 Nov 2022
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural
  Code Generation
MultiPL-E: A Scalable and Extensible Approach to Benchmarking Neural Code Generation
Federico Cassano
John Gouwar
Daniel Nguyen
S. Nguyen
Luna Phipps-Costin
...
Carolyn Jane Anderson
Molly Q. Feldman
Arjun Guha
Michael Greenberg
Abhinav Jangda
ELM
61
86
0
17 Aug 2022
Competition-Level Code Generation with AlphaCode
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
41
1,337
0
08 Feb 2022
Program Synthesis with Large Language Models
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
72
1,846
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
132
5,328
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
221
657
0
20 May 2021
Meta Back-translation
Meta Back-translation
Hieu H. Pham
Xinyi Wang
Yiming Yang
Graham Neubig
28
26
0
15 Feb 2021
Improving Neural Machine Translation Models with Monolingual Data
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
183
2,705
0
20 Nov 2015
1