Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.09835
Cited By
ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code
16 November 2023
Xiangru Tang
Yuliang Liu
Zefan Cai
Yan Shao
Junjie Lu
Yichi Zhang
Zexuan Deng
Helan Hu
Kaikai An
Ruijun Huang
Shuzheng Si
Sheng Chen
Haozhe Zhao
Liang Chen
Yan Wang
Tianyu Liu
Zhiwei Jiang
Baobao Chang
Yiming Zong
Yujia Qin
Wangchunshu Zhou
Yilun Zhao
Arman Cohan
Mark B. Gerstein
ELM
LLMAG
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ML-Bench: Evaluating Large Language Models and Agents for Machine Learning Tasks on Repository-Level Code"
6 / 6 papers shown
Title
Paper2Code: Automating Code Generation from Scientific Papers in Machine Learning
Minju Seo
Jinheon Baek
Seongyun Lee
Sung Ju Hwang
AI4CE
39
0
0
24 Apr 2025
BixBench: a Comprehensive Benchmark for LLM-based Agents in Computational Biology
Ludovico Mitchener
Jon M. Laurent
Benjamin Tenmann
Siddharth Narayanan
Geemi P Wellawatte
A. White
Lorenzo Sani
Samuel G. Rodriques
LLMAG
LM&MA
ELM
62
3
0
28 Feb 2025
AAAR-1.0: Assessing AI's Potential to Assist Research
Renze Lou
Hanzi Xu
Sijia Wang
Jiangshu Du
Ryo Kamoi
...
Xi Li
Kaipeng Zhang
Congying Xia
Lifu Huang
Wenpeng Yin
35
5
0
29 Oct 2024
SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering
John Yang
Carlos E. Jimenez
Alexander Wettig
K. Lieret
Shunyu Yao
Karthik Narasimhan
Ofir Press
LLMAG
101
191
0
06 May 2024
LLM vs. Lawyers: Identifying a Subset of Summary Judgments in a Large UK Case Law Dataset
Ahmed Izzidien
Holli Sargeant
Felix Steffek
AILaw
ELM
42
7
0
04 Mar 2024
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
624
0
20 May 2021
1