Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.12115
Cited By
SWE-Lancer: Can Frontier LLMs Earn
1
M
i
l
l
i
o
n
f
r
o
m
R
e
a
l
−
W
o
r
l
d
F
r
e
e
l
a
n
c
e
S
o
f
t
w
a
r
e
E
n
g
i
n
e
e
r
i
n
g
?
1 Million from Real-World Freelance Software Engineering?
1
M
i
ll
i
o
n
f
ro
m
R
e
a
l
−
W
or
l
d
F
ree
l
an
ce
S
o
f
tw
a
re
E
n
g
in
eer
in
g
?
17 February 2025
Samuel Miserendino
Ming Wang
Tejal Patwardhan
Johannes Heidecke
Re-assign community
ArXiv
PDF
HTML
Papers citing
"SWE-Lancer: Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering?"
20 / 20 papers shown
Title
WebGen-Bench: Evaluating LLMs on Generating Interactive and Functional Websites from Scratch
Zimu Lu
Yiran Yang
Houxing Ren
Haotian Hou
Han Xiao
Ke Wang
Weikang Shi
Aojun Zhou
Mingjie Zhan
Haoyang Li
LLMAG
69
0
0
06 May 2025
Cost-of-Pass: An Economic Framework for Evaluating Language Models
Mehmet Hamza Erol
Batu El
Mirac Suzgun
Mert Yuksekgonul
J. Zou
ELM
63
0
0
17 Apr 2025
OpenCodeInstruct: A Large-scale Instruction Tuning Dataset for Code LLMs
Wasi Uddin Ahmad
Aleksander Ficek
Mehrzad Samadi
Jocelyn Huang
Vahid Noroozi
Somshubra Majumdar
Boris Ginsburg
ALM
83
1
0
05 Apr 2025
Multi-SWE-bench: A Multilingual Benchmark for Issue Resolving
Daoguang Zan
Zhirong Huang
Wei Liu
Hanwu Chen
L. Zhang
...
Jing Su
Tianyu Liu
Rui Long
Kai Shen
Liang Xiang
95
3
0
03 Apr 2025
Z1: Efficient Test-time Scaling with Code
Zhaojian Yu
Yinghao Wu
Yilun Zhao
Arman Cohan
Xiao-Ping Zhang
LRM
81
13
0
01 Apr 2025
Survey on Evaluation of LLM-based Agents
Asaf Yehudai
Lilach Eden
Alan Li
Guy Uziel
Yilun Zhao
Roy Bar-Haim
Arman Cohan
Michal Shmueli-Scheuer
LLMAG
ELM
Presented at
ResearchTrend Connect | LLMAG
on
07 May 2025
154
11
0
20 Mar 2025
CodeElo: Benchmarking Competition-level Code Generation of LLMs with Human-comparable Elo Ratings
Shanghaoran Quan
Jiaxi Yang
Bowen Yu
Jian Xu
Dayiheng Liu
...
Zeyu Cui
Yang Fan
Yanzhe Zhang
Binyuan Hui
Junyang Lin
ALM
ELM
LRM
106
29
0
02 Jan 2025
SWE-bench Multimodal: Do AI Systems Generalize to Visual Software Domains?
John Yang
Carlos E. Jimenez
Alex Zhang
K. Lieret
Joyce Yang
...
Gabriel Synnaeve
Karthik Narasimhan
Diyi Yang
Sida I. Wang
Ofir Press
64
32
0
04 Oct 2024
SciCode: A Research Coding Benchmark Curated by Scientists
Minyang Tian
Luyu Gao
Shizhuo Dylan Zhang
Xinan Chen
Cunwei Fan
...
Tianhua Tao
Ofir Press
Jamie Callan
Eliu A. Huerta
Hao Peng
ELM
70
23
0
18 Jul 2024
BigCodeBench: Benchmarking Code Generation with Diverse Function Calls and Complex Instructions
Terry Yue Zhuo
Minh Chien Vu
Jenny Chim
Han Hu
Wenhao Yu
...
David Lo
Daniel Fried
Xiaoning Du
H. D. Vries
Leandro von Werra
123
176
0
22 Jun 2024
NaturalCodeBench: Examining Coding Performance Mismatch on HumanEval and Natural User Prompts
Shudan Zhang
Hanlin Zhao
Xiao Liu
Qinkai Zheng
Zehan Qi
Xiaotao Gu
Xiaohan Zhang
Yuxiao Dong
Jie Tang
ELM
69
17
0
07 May 2024
Concept Induction: Analyzing Unstructured Text with High-Level Concepts Using LLooM
Michelle S. Lam
Janice Teoh
James A. Landay
Jeffrey Heer
Michael S. Bernstein
53
49
0
18 Apr 2024
LiveCodeBench: Holistic and Contamination Free Evaluation of Large Language Models for Code
Naman Jain
King Han
Alex Gu
Wen-Ding Li
Fanjia Yan
Tianjun Zhang
Sida I. Wang
Armando Solar-Lezama
Koushik Sen
Ion Stoica
ELM
94
395
0
12 Mar 2024
SWE-bench: Can Language Models Resolve Real-World GitHub Issues?
Carlos E. Jimenez
John Yang
Alexander Wettig
Shunyu Yao
Kexin Pei
Ofir Press
Karthik Narasimhan
ELM
85
576
0
10 Oct 2023
Goal Driven Discovery of Distributional Differences via Language Descriptions
Ruiqi Zhong
Peter Zhang
Steve Li
Jinwoo Ahn
Dan Klein
Jacob Steinhardt
80
51
0
28 Feb 2023
CodeGen: An Open Large Language Model for Code with Multi-Turn Program Synthesis
Erik Nijkamp
Bo Pang
Hiroaki Hayashi
Lifu Tu
Haiquan Wang
Yingbo Zhou
Silvio Savarese
Caiming Xiong
ELM
148
1,016
0
25 Mar 2022
Competition-Level Code Generation with AlphaCode
Yujia Li
David Choi
Junyoung Chung
Nate Kushman
Julian Schrittwieser
...
Esme Sutherland Robson
Pushmeet Kohli
Nando de
Koray Kavukcuoglu
Oriol Vinyals
116
1,379
0
08 Feb 2022
Program Synthesis with Large Language Models
Jacob Austin
Augustus Odena
Maxwell Nye
Maarten Bosma
Henryk Michalewski
...
Ellen Jiang
Carrie J. Cai
Michael Terry
Quoc V. Le
Charles Sutton
ELM
AIMat
ReCod
ALM
195
1,948
0
16 Aug 2021
Evaluating Large Language Models Trained on Code
Mark Chen
Jerry Tworek
Heewoo Jun
Qiming Yuan
Henrique Pondé
...
Bob McGrew
Dario Amodei
Sam McCandlish
Ilya Sutskever
Wojciech Zaremba
ELM
ALM
224
5,518
0
07 Jul 2021
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
248
679
0
20 May 2021
1